<body>

Naming Hierarchy

Saturday, March 26, 2016


A small number of FNA keys differ slightly from the normal key structure. The common element of this difference is that the keying sequence is not from genus to species and then to subspecies, but instead skips the species rank. The following diagram shows three different ways that this can occur.

In order to deal with these cases using essentially the same algorithms that are used for normal key structures, the data structures need to accommodate the naming hierarchy as well as the hierarchy used for keying. As shown in the diagram in the Node Structure and Types section, the naming hierarchy is implemented using the targetList in the nodeEntry.


The following is an example of the third way the naming hierarchy differs from the keying hierarchy. This is a part of the Ranunculus key that has one Characters Set for R. hispidus var. nitidus originating from the genus key and another from the species key.

Besides this case, subspecies of the following are keyed from the genus instead of their species key:
  • Piperia elegans
  • Ranunculus acriformis
  • Ranunculus aquatilis
  • Ranunculus canus

Localized Keys and Predetermined Characteristics

Wednesday, March 16, 2016

The first step in using the FNA keys to determine the family of the specimen. The location of where that specimen was collected limits the possible families; this type of key localization is easily understood.

The key to taxa in a family can also be localized. Since some of the paths through the key will lead to taxa not in a given location, there must choices for some couplets that also do not lead to the location.
By tracing every possible path through the key for each taxa in the family, each couplet choice can be associated with a list of locations for taxa following that lead. Therefore, for a given location, the only allowed paths are those that either those through couplets with your location for either choice or through couplets with just your location for one of the choices; that is, if your location is not on the list of locations for a couplet choice, the only possible path is through the other choice.

The Alismataceae key is moderate sized, and the taxa in the family are widely, though not evenly, dispersed in North America. The following diagram for the key shows how it would be localized for taxa in Washington state. The couplet characters are not shown, but the couplet numbers are the same as those for the online key and for its subkeys.
It is generally not worthwhile to use locations lists with printed keys because lists are usually too long to print and too long for the user to look through the list. But they could be used with online keys and with keying apps for smartphones and tablets; the user would specify the location (or locations), and they would be presented with a localized key. As the user traverses the key, they would be told when their path passes a couplet that could be skipped, and the user would usually verify that their specimen has the characters of the choice in the path taken. For example, in the above example, before getting to couplet 2 (the first couplet where they need to make a choice), they would be asked whether their specimen has the characters in the 1+ couplet choice.

Creating good localized keys depends on accurate species location information. The webpage describing the regional keys for the Jepson eflora notes that, because of this, the key for larger area may need to be used to key out some species.

Predetermined Characters - Just like being able to localize a key, a key can be pre-processed for a given morphological character that is used to differentiate targets in the key; the only allowed paths are those that result in targets that have that desired character, simplifying key traversal.

For example, suppose you want to use the Ericaceae Kalmia key to key out a specimen that has alternate leaves.  Instead of using the key

1        Petals distinct nearly their entire lengths.                                                       7 buxifolia
+        Petals connate ca. 1/2 to nearly their entire lengths                                       (2)
2 (1)   Leaves opposite; inflorescences terminal, corymbiform racemes or umbels (3)
+        Leaves alternate (rarely opposite) or in whorls; inflorescences solitary
          flowers or fascicles, racemes, or panicles                                                      (5)
3 (2)   Midribs of leaf blades with purple, clavate trichomes; seeds 1.5-2.2 mm.     6 polifolia
+        Midribs of leaf blades without trichomes; seeds 0.5-1.4 mm                           (4)
4 (3)   Capsules 5-locular; petals 7-9 mm, shallowly cleft; stamens 10.                   5 microphylla
+        Capsules 2-3-locular; petals 3-5 mm, cleft ca. 1/2 their lengths; stamens 5. 8 procumbens
5 (2)   Leaves usually in whorls of 3 (rarely alternate or opposite)                           2 angustifolia
+        Leaves alternate or seemingly whorled                                                           (6)
6 (5)   Leaves deciduous; petals white with red band adaxially.                               3 cuneata
+        Leaves persistent; petals white to pink or red                                                (7)
7 (6)   Plants 8(-12) m; leaf blade surfaces glabrous adaxially (only midrib
           puberulent), 4-12 cm; inflorescences terminal panicles                                1 latifolia
+        Plants 0.6(-1.2) m; leaf blade surfaces usually hairy, 0.5-1.4 cm;
           inflorescences solitary flowers or, sometimes, fascicles or racemes,
           scattered along stem in leaf axils.                                                                 4 hirsuta

if this key had been pre-processed for alternate leaves, then you could use this smaller key

1        Petals distinct nearly their entire lengths.                                                       7 buxifolia
+        Petals connate ca. 1/2 to nearly their entire lengths                                       (2)
2 (1)   Leaves usually in whorls of 3 (rarely alternate or opposite)                           2 angustifolia
+        Leaves alternate or seemingly whorled                                                           (3)
3 (2)   Leaves deciduous; petals white with red band adaxially.                               3 cuneata
+        Leaves persistent; petals white to pink or red                                                 (4)
4 (3)   Plants 8(-12) m; leaf blade surfaces glabrous adaxially (only midrib
           puberulent), 4-12 cm; inflorescences terminal panicles                                1 latifolia
+        Plants 0.6(-1.2) m; leaf blade surfaces usually hairy, 0.5-1.4 cm;
           inflorescences solitary flowers or, sometimes, fascicles or racemes,
           scattered along stem in leaf axils.                                                                 4 hirsuta

For this method to be successful, work would be needed to parse the couplet leads, which can have ambiguous syntax, so it is not clear that use of predetermined characters would be cost effective.  However, it may offer an alternative to creating interactive multi-access keys while offering the major advantage of multi-access keys: most characters of a sample do not need to be examined in a predetermined order.

Multiple-Characters-Set Groups

Monday, February 10, 2014
In the last section, we showed some diverse taxa that have multiple characters sets terminating on a taxon. That is, there is more than one path to the target taxon. In some of the more complex keys, there may be more than one path to a group of taxa. There are two cases where this occurs.

If a key has subkeys (see Key Types and Subkeys), there may be more than one path to a subkey. For example, Cyperaceae Carex has subkeys Key A through Key F, and each of these have subkeys that are intermediate ranks. In particular, one of the subkeys for Key C is section Ovales, and there are three characters sets leading to the group of species in section Ovales:

This is similar to multiple characters sets leading to the key for a target taxon (for example, Silene scolari shown in A Key as a Hierarchy).

However, the taxon group does not have to be named. Here is part of the Carex section Ovales key for species east of the Rocky Mountains:
Both couplet 56 and couplet 57 have two characters sets leading to them. That is, there are two paths leading to the taxon group of Carex opaca and C. shinnersii, and there are two paths leading to the taxon group of C. opaca, C. shinnersii and C. missourieusis. In the FNA online key, special notation indicates couplets 56 and 57 are exceptional; in particular, instead of the standard (57) to indicate the next couplet is 57, the phrase "Go to couplet 57" is used for one of the characters sets.

How a connector node is used with a target taxon that has multiple characters sets is described in Node Structure and Types. A connector node is also needed for a couplet with multiple characters sets leading to taxa group.

The fact that couplets 56 is associated with the opaca_shinnersii_missouriensis taxon group (and couplet 57 is associated with the opaca_shinnersii taxon group) is stored as intermediate name in the couplet node (see Node Structure and Types). The intermediate name in a couplet node is also sometimes used to store an intermediate rank that is associated with the couplet.

Node Groups & Relationships

Saturday, December 21, 2013
Nodes (see Node Structure and Types) form node groups, which are then connected to create a database model for the FNA keys.  There are three group types


However, they share generalized group elements.  In particular, the End Nodes for both have the same dual role that is described in Node Structure and Types. To Do: Need a diagram like on 1/11/2015 in order to include the 2 types of connector nodes and to allow for 2 layers of connector nodes - use the real diagram for Salicaceae Populus on 12/30/2015.  Need a diagram like on 1/28/2015 to show the all-inclusive sequence of nodes.  Also need to show that multiple-attribute-set connector nodes can be both external and internal to the key.

Node Group Elements -
  1. Base Node
    • The Base Node provides the parent taxon name or intermediate rank name; all targets in the group are members of this taxon or intermediate rank.
    • If the Couplet Nodes immediately follow the Base Node, it provides the base node number for the group; this node number is used to convert the row numbers of the Couplet Nodes to the node numbers used in the database. This is the case if the Base Node is a root, connector or singleton node.
    • If the Couplet Nodes are separate from the Base Node, then the base node number is one less than the first couplet node number. This is the case if the Base Node is a target node (or a couplet node with an intermediate title) in a higher level node, so that node has dual roles: it is a target node in the higher level node group and Base Node for the current node group.
    • If the Base Node is a singleton parent, then there are no Couplet Nodes and the singleton node is the single End Node and immediately follows the Base Node. In this case the Base Node can be a root, connector, target with an intermediate title or a singleton node. Note that if the Base Node is a singleton node, it also acts as a singleton parent.
  2. Couplet Nodes
    • Singleton groups have no couplet nodes.
    • Key groups have zero or more couplet nodes. With no couplet nodes, there are two target nodes. With one or more couplet nodes, the end nodes can be target or attribute set nodes.  There are special cases where a target node can be replaced by a segregate node.
    • If the end nodes are attribute set nodes, then associated connector node may specify a couplet node that acts as a merge point; that is, all attributes between the Base Node and the merge point are used in common with with all attribute sets associated with that connector node.
    • If there are subkeys, the master key and the subkeys are separate node groups. They are joined together by couplet or connector nodes with an intermediate titles.
  3. End Nodes
    • As discussed under Base Node, the End Node could be a singleton node; in all other cases, there is always more than one End Node.
    • For target End Nodes, the target node also acts as a Base Node for the node at the next level.
    • Attribute set nodes exist when a taxon has more than one attribute set that describe that taxon, so require an associated connector node for the taxon, which acts as the Base Node for the node at the next level.
    • In each of the three End-Node cases above, instead of the End Node being the Base Node for the next level, the End Node can be a terminal taxon.
Node Creation Process - From a high level, our objective is to model the FNA using these objects:
  • Root nodes
  • Keys
  • Connector nodes
  • Alternate Key nodes
  • Singletons
In more detail, all nodes can be created sequentially using the following process:
  1. For each family:
    1. Create the family root node, which acts as a base node
    2. If level has key(s)
      1. For the key (or for each alternate key):
        1. Create a multiple-attribute-set connector node for each file in the multiSpcSubclasses directory
        2. If this level has alternate keys, create an alternate key node for each key, which serves as the base node and supplies the name for the alternate key
        3. For each row in the key, create one of these node types:
          • Couplet
          • Target
          • Attribute Set
          • Segregate
      2. If this level has alternate keys, create an alternate-key connector node for each target (both single and multiple-attribute-set targets)
    3. Else (level has a singleton)
      1. Create a singleton node, which acts as a base node for the next level
    4. Repeat B. or C. for each level, but there are no alternate keys at the species and subsp levels
Therefore, to create the Node List, one starts with the root for each family and works through each level, adding keys, connector nodes, alternate key nodes and singletons, where for each key a node is added for each row. The Node List is numbered using the class list index (cli) with values 0, 1, 2, ... (see diagram in Node Structure and Types).

Additional Nodes for Alternate Keys - Six alternate keys were enumerated in Key Types and Subkeys. There must be a separate key base node for each alternative since each has separate child nodes.  Also there must be separate target, attribute set and connector nodes for each alternative key since the parent(s) of each of of these nodes is different. Finally, each target or connector node needs an alternate-key connector node, in order to distinguish the different attribute sets from the paths through each alternate key.

The node list and node numbering must take these additional nodes into account. After the user chooses which alternate key they want, this determines which of alternate nodes are to be used.

Node Structure and Types

Thursday, September 19, 2013
Nodes and Tags - The data in the FNA keys can be modeled using nodes, each of which can have
  • Parents and children
  • Tags that store values associated with the nodes
Node Structure - The following shows the structure of the data within each node:
The last three items in a node (the NodeEntry in the nodeList) are pointers to lists that contain a subset of other nodes. The parentList allows tracing each path back to the root node for a family. The childList allows tracing each path through each couplet in the key to all targets in the key; see A Key as a Hierarchy.  The targetList also allow tracing each path to all nodes, but not through couplets; it also allows tracing to nodes for the rare cases where the nodes are only used for naming (see Naming Hierarchy) or nodes are not in any key, but are on the taxa list for a parent taxon.

[Modify to add Target Subset Index (pointer to a class subset for all keys that has the key's target list; these are only used in the key's base node or, if there are alternate keys, in the alternate keys' base node, which are shown in the Node Groups & Relationships figures), and a tag for taxa that are base classes for keys showing special types like targetsInKey (see 5/27/2015).] [Target Subset Index (tsi) creates a "naming hierarchy", which is usually the same as the "keying hierarchy".]

Node Types - The node type is specified in the tagListType field of each ClassEntry.  The node type descriptions below need to be read in conjunction with understanding how nodes are related, which is described in Node Groups & Relationships and in the Dual Role of Nodes section below.
  • Root - This is the base for each family, which contains the family name.  Most are base nodes for each family key, but a few families have only one species, so no key is involved and the root is a singleton parent.  Besides the family name, the node contains the unique taxon id assigned by FNA; also the FNA numbers each family (currently from 1 to 128), so this number is also contained in the root node (as the Target # in the table below).  Each family root node is level 1 in the taxon hierarchy. If a root node is the base node for a family key, the children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key.  This is not the case if there are alternate keys for the family; instead the root node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user.
  • Couplet - Couplet nodes document the decision points in the keys (see A Key as a Hierarchy).  The node has pointers to each of the two choices of attributes, but those attributes themselves are not contained in this node.  However, in this node is the attribute(s) for the choice that led to this couplet. For each couplet choice an attribute shows what locations (see Taxon Locations) taxa with paths through this choice are found; this also applies to terminal taxa. The level of this node is the same as that of the base node for this key.  As mentioned in Key Types and Subkeys, some keys have subkeys, each of which is given a name with a number or letter or with an intermediate taxon rank; for subkey couplet nodes, this name is stored with the couplet node (other types of couplet nodes do not have a name). An intermediate rank can also be associated with a choice that points to another couplet rather than to a subkey. Besides pointing to another couplet node, a couplet node can point to two types of target nodes, which are described next.
  • Target - This is the destination taxon, which was arrived at by a unique sequence of choices in the key; that is, there is a single attribute set in the key that has this taxon as its target.  This taxon will be at the next level compared to the level of the base node for this key.  In this node is the final attribute choice that led to this target.  Also in this node is the taxon name and number relative to the taxon in the key's base node; and there is a unique taxon id assigned by FNA. If a target node is the base node for a key, the node's children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key. This is not the case if there are alternate keys at this level; instead the target node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user. There is a rather special case where the target is an intermediate taxon that only has one child, so the intermediate name has to be stored with this type of target node as well as the taxon name of the child (this should not be called a subkey singleton because it is not a key and is not a singleton node, as used below). [To Do: Either (1) target nodes are one level above base & intermediate nodes only associated with couplets or (2) add additional nodes that show all immediate children of intermediate nodes, so can show hierarchy.]
  • Attribute Set - This node corresponds to one attribute set for a destination taxon that has multiple attribute sets (an attribute set is a sequence through the key leading to a target taxon; multiple-attribute-set targets were described in A Key as a Hierarchy). This node is similar to a target node, but since there are multiple paths to this destination taxon, there is also a special connector node that each of these attribute set nodes point to. Appended to the taxon name is an attribute set number in order to make the node name unique. There is no relative target number and taxon id since these are in the connector node.  Note that there can also be multiple attribute sets leading to subkeys, so in this case instead of a taxon name, the name is an intermediate taxon name.
  • Multiple-Attribute-Set Connector - This node is the destination taxon node pointed to by each of the multiple-attribute-set nodes; this connector node points back to each of those nodes (so to follow a path to the base node requires knowing which of the attributes sets was selected). For an example of a connector node see Effect of Single and Multiple Parents on Next-Level Keys in A Key as a Hierarchy. Like a target node, this node has the taxon name, relative target number and taxon id.  In addition, this node has the merge point for all of the attribute set paths; that is, all attribute sets have common attributes between the merge point node and the base node for the key; in some cases, the merge point and the base node are the same (there are no common attributes).  If the attribute set nodes are for an intermediate taxon, this connector node will be for that intermediate taxon also.  [Probably want to split this into Multiple-Attribute-Set Target Connector and Multiple-Attribute-Set Group Connector (or External and Internal Multiple-Attribute-Set Connectors) because in the table below Multiple-Attribute-Set Group Connector does not need Taxon Id and Target #.]
  • Alternate-Key Connector -  This node is needed whenever there are alternate keys (in addition to the alternate key node, which is discussed next).  In this case, a connector node is needed for each target in the alternate keys, so that as couplet choices are made that traverse down through any target, which of the alternate keys was used can be saved and when traversing back up through that target, the same alternate key is used in creating the attribute list for the terminal target. If there are multiple-attribute-set nodes in an alternate key, a multiple-attribute-set connector node is needed for each set, and each of those connector nodes is connected by an alternate-key connector node.
  • Alternate Key - An alternate key node connects a root or target node to all available alternate keys for the family or target. Currently only eleven alternate key nodes are needed, which are listed in Key Types and Subkeys. Each alternate key node is the base nodes for the corresponding key. Each is named so that the parent can offer a choice to the user which of the alternate keys they want to use.
  • Singleton - This node represents a singleton node, which results when a parent node only has one child that is at the next level, so no key is involved.  A root node, a target node or a connector node can be a singleton parent node.  A singleton node contains the taxon name, relative target number and taxon id.
  • Missing-Key Target - If a taxon has children, but the key to distinguish the children from each other is missing, then they can still be listed under that taxon.  For those that are not terminal, then lower keys may exist, so that a hierarchy can be shown with a gap for the missing key. A missing-key target is also used when a taxon is keyed at a level that is not normal for that taxon; e.g., there is no need for a subsp. key for Piperia elegans because all these subspecies are in the key for species at the Piperia level, but a missing-key target node is needed for Piperia elegans to show the hierarchy of names from Piperia to elegans and then to its two subspecies.
  • Segregate - A target in a key that is where a taxon used to exist before FNA moved it to a new location in the hierarchy. A link gives that new location, so that keying of a specimen can be resumed from there.
The following summarized the data contained in each type of node:

Node Type Name Rank Locations Row Label Characters Taxon Id Taxon Id Type Target # Merge Point Intermediate Name
Root





Couplet



Target
Multiple-Path Target




Multiple-Path-Target Connector



Multiple-Path-Group Connector





Alternate-Key Connector





Alternate Key







Singleton





Missing-Key Target





Segregate








See Node Groups & Relationships for a description of how nodes are combined to model the FNA keys.

Dual Role of Nodes - The node type reflects the role of a node relative to its parent or to other nodes in a key.  Root, target, connector and singleton nodes also have another role relative to nodes at the next level; these roles are either
  • Couplet
  • Singleton Parent
  • Key Base
  • Offer alternate key choices
  • Terminal
These are not separate nodes, but nodes at the previous level acting in their other role.  In Node Groups & Relationships the more general term Base Node is used for Singleton Parent or Key Base.

Tag List Creation - The following are general steps in creation of a tag list and tagListIndex that is used in a class entry:

  1. Create a new Tag object.
  2. Add that object to the tags array list, keeping track of the tag index.
  3. Create a new TagList object, using the tag index. keeping track of the tag list index.
  4. Get a reference to that TagList object.
  5. For each addition tag needed, create a new Tag object.
  6. Add that object to the tags array list, keeping track of the tag index.
  7. Add that tag index to the TagList object.
  8. Insert the tag list index obtained in step 3. in the class entry.


A Key as a Hierarchy

Tuesday, September 17, 2013

This section shows the structure of the FNA keys is a hierarchy, at least conceptually; the structure of the database used to store those keys is explained later in Node Types & Relationships.

Targets with a Single Parent - A couplet is a parent-child relationship with two children:
As mentioned in Couplets, Targets and Rows, each couplet choice leads to either another couplet or to a target.

If all targets in a key have a single parent, then there is a unique path between the root of the key (a family, genus, species or subspecies) to each target, usually one rank down from the root.  That is, from each target one can unambiguously traverse back up to the root of the hierarchy.

Targets with Multiple Parents - However, as seen in the Characters Sets and Paths section, some targets in keys can have more than one characters set, which results in multiple parents for those targets. For example, the paths to the target Boechera repanda shown in that section can be redrawn to show B. repanda with two parent rows:

That is, depending on which path is chosen, a different characters set results.  When the target is terminal, as is the case for B. repanda, visualizing the multiple paths to the root is relatively easy.

Effect of Single and Multiple Parents on Next-Rank Keys - How keys are connected to the key at the next rank is different for targets with single and multiple parents; that is, with targets with single and multiple characters sets. An example of each case is shown below.

Silene is a target in the Caryophyllaceae key with a single characters set:
The Silene key has the following connections to the Caryophyllaceae key:
The row numbers in this diagram are relative to the key for the rank; that is, in the Key Types and Subkeys section, renumbering of rows in concatenated keys is discussed, but row numbers in this section are before that renumbering.

What is important is that the parent for the initial couplet in the Seline key is in the Caryophyllaceae key, so the logic to set up this parent-child relationship is different than if they were in the same key. Row 37 is the Silene target, but it also has a next couplet that is the first couplet in the Silene subkey. For the single parent case there is no zero row representing the root for the Silene key; that is, the row zero shown in the key diagram in Couplets, Targets and Rows is not needed in this case.

S. scoleri is a target in the Silene key with two characters sets:
By creating a special node that acts as a target for rows 118 and 143 in the Silene key and as a couplet for rows 1 and 2 in the S. scolari key, all other elements in the Silene and S. scolari keys can be treated as normal parent-child relationships.:
 
The Node Structure and Types section discusses how key hierarchies, including these special multiple-characters-set targets, are represented as data structures. In particular, a childList is used for the choices a couplet and a parentList is used for the parent, or, if there are multiple-characters-sets, for the parents.

Subkeys and Concatenated Keys

Thursday, September 5, 2013
The number of keys in FNA is greater than the 2145 keyed taxa from the table in Taxon List & Counts by Type.  One reason is five taxa have alternate keys:
  • Asteraceae:   Synoptic and Artificial Keys
  • Brassicaceae:  Flower-Based and Fruit-Based Keys
  • Lauraceae:  Flower-Based and Fruit-Based Keys
  • Portulacaceae Portulaca:  Flower-Based and Fruit-Based Keys
  • Salicaceae Populus:  Flower-Based, Fruit-Based and Leaf-Based Keys
Because the alternate keys add new attributes and new paths through the keys, each key must be handled separately (see Node Structure and Types for the mechanism to choose which alternate key to use and see Node Groups and Relationships for the effect on node numbering).

Adding the six alternate keys, there are 2151 master keys.  Of these, 51 have subkeys, so 2100 don't. Those keyed taxa with subkeys tend to have a lot of subkeys: 427. Summarizing FNA keys by type:

Keys with no subkeys 2100
Keys with subkeys 51
Subkeys 427
Total keys 2578

My list of taxa with multiple keys, with their alternate keys, master key and subkeys, is here. You can see from this list that the names of the subkeys are either an intermediate rank that has its own subkey or a number or letter.

Some taxon pages have a List of Keys, which contains the master key and subkeys. More commonly, keys for intermediate ranks are included in the list of lower taxa. Some of the intermediate ranks listed by this method are singleton parents (they have only one child), so these are not subkeys.

Concatenated Keys - Numbering of couplets and rows in keys and subkeys start with 1; numbering of couplets and rows in keys was discussed in Couplets, Targets and Rows.  But to create a single key for a key that has subkeys, the master key and all subkeys must be concatenated into one key.  The numbering of couplets and rows in the first subkey must be renumbered to continue where numbering left off in the master key, and likewise as each subkey is added.

As an example, look at the couplet and row numbers shown in the partial paths diagram for Brassicaceae Boechera shown in Couplets, Targets and Rows. This was created from the master key and from the Group 1 subkey; the couplets and rows for the subkey were renumbered to continue on from the numbering in master key.

Taxon Locations

Friday, August 30, 2013
I've posted two spreadsheets that show state or province postal codes for all FNA terminal taxa.  Due to a Google Drive spreadsheet size limitation, these data were split:  Alberta through North Dakota and Nebraska through Yukon Territory.  Taxon counts by location are in the last row.

Terminal taxa  in my taxon list (see Taxon List & Counts by Type) have locations shown on their FNA taxon page.  The 11,200 terminal taxa that have their own taxon page have the following location information:

Location directly shown on taxon page 11,148
Location indirectly available 50
Location information not available 2

Indirect location information is most commonly found from a distribution map.  One of the taxa (Iris germanica) with no location information says it "may persist after cultivation"; the other (Corispermum nitidum) says it was "supposedly introduced but doubtful".  Location information for the few taxon without their own page has not been added yet.

The 65 North American locations used in the FNA are the 49 continental states and District of Columbia in the USA, the 13 provinces and territories of Canada, Greenland which is an autonomous country within Denmark, and the islands of St. Pierre and Miquelon which is a self-governing territory of France.

In the future, if a user of the FNA keys selects the location(s) that they are interested in, then the keys can be customized.  Then the keys would only show those couplets and terminal taxa relevant to their location(s), which greatly simplifies keying a specimen. Also the keys can be simplified because there are location-specific couplets in the keys; these couplets may eliminate all except one attribute set leading to the terminal taxa.

[add diagram "Location-Dependent Dichotomous Keys" and reference location attribute in table on Node Structure and Types post]

Couplets, Targets and Rows

Friday, August 23, 2013
Before using FNA keys to identify a specimen, the family must be known. If the family is not known, use, for example, the angiosperm and gymnosperm family keys, which are multi-access keys, created as part of the Pathkey project.

Then the genus is found from the FNA key for the identified family, the species is found from the key for the identified genus, and, if appropriate, the subspecies is found from the key for the identified species. That is, identification proceeds from the more general taxonomic rank to the specific.

In each FNA key, couplets are numbered starting with 1.  A couplet gives a choice of two alternatives, each of which describe one or more characteristics (called "characters") that are to be matched with the specimen that is being identified. The first choice in the key is labeled with the couplet number and the second is labeled "+". Following each choice, where to go next in the key is indicated by a couplet number.  However instead of a next couplet, one or both choices may lead to a destination target, which is a taxon with rank one lower than the rank of the key.

A diagram showing the hierarchy of couplet numbers is a simple way to show the overall structure of a key.  For example, the following diagram shows the structure of the Caryophyllaceae Drymaria key using couplet numbers and destination targets:
As a choice is made at each couplet, the characters set (see Characters Sets and Paths) for the target grows.

When creating a target's characters set, it is easier to use the row numbers as an index into the key, instead of using couplet numbers. The keys are designed such that the row number can be obtained from the couplet number:
Couplet number Row number for current couplet
Using first choice Using second choice
n 2n - 1 2n
In other words, the row number is double the couplet number, except for the first choice, the row number is one less than this. This assumes that each couplet lead is only on one row; that is all lines for characters of a lead are put into one row. The algorithms for traversing the keys depend on this couplet-to-row relationship; see Key Types and Subkeys and Naming Hierarchy for how special cases are handled to maintain this relationship.

This is another version of the diagram for the Caryophyllaceae Drymaria key using row numbers instead of couplet numbers:
A row number 0 (representing the root of the Drymaria key) has been added as a starting point.

When row numbers are needed, a third version that has both couplet and row numbers is better. For an example of such a diagram, see the one in Characters Sets and Paths for Brassicaceae Boechera. Notice where row numbers are twice couplet numbers. Also note that the row number of the first choice in a couplet is put to the left of the couplet, and the second choice is put to the right.

Couplet diagrams are easier to create and understand, but rows are used to help create the characters-set list for a target, as described in Characters Sets and Paths and are needed to understand how keys at different ranks should be connected, as described in A Key as a Hierarchy.

As discussed in Key Types and Subkeys, for a given key rows are numbered, but as keys are concatenated to form the key for a family, rows are renumbered, so that the row number applies to the key for the family. That is, for the family key, couplet numbers are less useful because they only apply to the key for a given rank, but row numbers can be used to uniquely identify any couplet or target in the key at any rank in the family; also row numbers are used in constructing nodes, as discussed in Node Structure and Types.

Characters Sets and Paths

Tuesday, August 20, 2013
As one makes binary choices of characters in a key, one creates a set of characters for the destination target and a path through the key. In listing the Characters Set (CS), it is more useful to reverse the order, so those characters chosen last are listed first because those are are most specific to the target.

For example, the CS for Boechera burkii is:
  • Racemes usually unbranched; cauline leaves 18-28; ovules 64-80 per ovary; seeds 1.2-1.4 mm wide
  • Biennials, without caudices; stems (2-)3-10 dm; cauline leaves 18-80
  • Basal leaf blade surfaces glabrous or with simple trichomes only
  • Cauline leaf blades not auriculate
However, the CS can be more complex. One reason is that there can be more than one CS for  a target when there are diverse characters leading to it. In the FNA keys, targets whose name is appended with "(in part)" in the key have more than one CS. Keys in this papers instead append CSn to the target, where n=1,2,…; that is, there are multiple characters sets that are numbered.

Using Boechera repanda as an example, CS1 is:
  • Basal leaf blades 10-25(-50) mm wide; petals 3.5-6 mm.
  • Fruit valves pubescent
And CS2 is:
  • Basal leaf blades 7-25(-50) mm wide, margins usually repand to dentate, rarely entire
  • Fruits 2-5.5 mm wide, divaricate-ascending to erect, ± appressed to rachises
  • Stems proximally with simple and/or branched trichomes usually less than 0.5 mm; ovules 8-52 per ovary
  • Fruits not secund
  • Fruits erect, ascending, or horizontal
  • Fruit valves glabrous
If a target has multiple characters sets, then the targets may have the same general attributes, and only the more specific characters differ. For Boechera repanda the characters that are in common for CS1 and CS2 are:
  • Plants usually sparsely to densely pubescent proximally (sometimes throughout)
  • Styles 0.05-2 mm
  • Basal leaf blade surfaces with at least some branched trichomes
  • Cauline leaf blades not auriculate
The following diagram for the Brassicaceae Boechera key shows how these characters-set lists relate to the type of diagram discussed in Couplets, Targets and Rows.
This diagram is easier to understand because it shows both couplet and row numbers and because it shows only the paths of interest through the key. From this, the characters for B. repanda CS1 can be found in rows 35 and 33, the characters for CS2 are in rows 73, 72, 70, 66, 48 and 34, and the characters in common for both CSs are in rows 26, 22, 10 and 1.

The paths for CS1 and CS2 merge at couplet 17; the characters list of B. repanda includes characters for both choices for couplet 17. If there are multiple characters sets for a taxon, there will be one or more couplets that will be merge points where the taxon has the characters of both choices of the merge point couplets.

There are two other reasons that the characters set for the path through a key can be more complex. As discussed in Key Types and Subkeys, some taxa have keys with subkeys (as does Boechera); each of these subkeys, which may correspond to intermediate-rank targets, add a characters set or multiple characters sets.  Also, as shown in Multiple-Characters-Set Groups, a couplet can have multiple characters sets even if there is no subkey, although this is rare.

Following every possible path through the key for a family results in listing every characters set, which  can be useful.  For large families, with keys at each rank and possibly subkeys and alternate keys, that list can be long.

Overview

Monday, August 12, 2013
The Flora of North America North of Mexico is being created by over 800 authors in an online version and in 30 printed volumes; this is usually called just Flora of North America (FNA). The FNA contains dichotomous identification keys, which allow biologists and serious amateurs to identify nearly all known plants in North America.

Using the online keys, my objective is to analyze the structure of those keys, looking for commonality. The details of this analysis are documented herein.

Given that structure, consistency checks were created to assure that there are no errors in the FNA keys. The types structural errors found were generally small, but the number of keys is large, so elimination of these errors is essential for algorithmic processing of the keys. The result was a database containing the FNA keys with the structural errors corrected. This database does not change the intended content of the keys; that is, the wording of couplets and resulting taxon names are changed very little, but there were many changes because of  missing or incorrect connections between the couplets and the taxa.  Note that the online FNA is created before the printed version, and, as far as I can tell, most, or nearly all, of the errors in the online version were corrected in the printed version.

Taxa in the FNA keys don't occur uniformly across the 65 locations (see Taxon Locations), so keying could be simplified if there were a key for each location (or set of locations). An advantage of using the FNA key database is that the FNA keys can be enhanced with localization information. How this is done and how these keys can be used is described in the section on Localized Keys. This result is demonstrated with an app for smart phones and tablets.

Data created by my analysis is too large to be contained in this blog, so this blog has links to Google Docs spreadsheets. A Google account or a gmail account is required to access these spreadsheets; if you don't have an account, you can go here to create one.


Contents

      Overview
      Couplets, Targets and Rows
      Characters Sets and Paths
      Multiple-Characters-Set Groups
      A Key as a Hierarchy
      Naming Hierarchy
      Taxon List and Counts by Type
      Subkeys and Concatenated Keys
      Node Structure and Types
      Node Groups and Relationships
      Taxon Locations
      Localized Keys and Predetermined Characteristics

Taxon List & Counts by Type

Sunday, August 11, 2013
The current list of taxa in the online version of the FNA keys is here.  As new volumes of the FNA are published and put online, this list will be updated. For most taxa, a unique taxon id is shown in the first column (a few exceptions are numerated below under Terminal Taxa). The types shown in the third column of the list is explained below.

Given that the plant family has already been identified, the FNA keys allow more and more specific identification at each taxon rank, ultimately leading to a terminal taxon.  The number of taxa at the different ranks are as follows:

Rank Type within Rank Number of Taxa
family family 234
genus genus 1785
genus subclasses species 10,939
hybrid 82
species subclasses subsp. 903
var. 2105
var.× or nothovar 3*
subsp. subclasses var. 15
Total in FNA keys 16,066
* See:
Asteraceae Petasites frigidus var.×vitifolius
Rosaceae Crataegus ×sicca nothovar. sicca
Rosaceae Crataegus ×sicca nothovar. glabrifolia 

Non-Terminal Taxa - Taxa that are non-terminal (families, genera and some species) are of three types (see the third column of my taxon list):

keyed A key is used to differentiate subclasses 2145
not keyed singleton parent There is only one subclass 1080
species for naming* only Species not in a key, but subspecies is 4
Total Non-Terminal Taxa 3229
* See section on Naming Hierarchy

Keyed taxa will be discussed further in Key Types and Subkeys.

Terminal Taxa - Terminal taxa are also indicated in the third column of my taxon list.  In a few cases, destination targets in the FNA keys are not listed in the site's taxa list; these targets are distinguished from the other terminal taxa.

terminating A target that has a unique taxon id 12,791
terminatingOnAssocTaxon Rather than have its own taxon id, the target is described on the page of an associated taxon 41
terminatingOnBadTaxon A target with no taxon page or with a bad taxon id 5
Total Terminal Taxa 12,837

The Node Structure and Types section shows how the taxon id type (one of the three types shown in this table) is stored in the node for each taxon.

The FNA keys can be used to differentiate the following types of destination targets:

hybrid 81
species 9737
subsp. 896
var. 2123
Total Terminal Taxa 12,837