Multiple-Attribute-Set Groups

Monday, February 10, 2014
Previously we showed some diverse taxa that have multiple attribute sets terminating on a taxon. That is, there is more than one path to the target taxon.  In some of the more complex keys, there may be more than one path to a group of taxa.  There are two cases where this occurs.

If a key has subkeys, there may be more than one path to a subkey.  For example, Cyperaceae Carex has subkeys Key A through Key F, and each of these have subkeys that are intermediate ranks.  In particular, one of the subkeys for Key C is section Ovales, and there are three attribute sets leading to the group of species in section Ovales:
This is similar to multiple attribute sets leading to the key for a target taxon (for example, Silene scolari shown in A Key as a Hierarchy).

[Include a diagram like on 6/24/2015 showing the MASG base node becomes the merge point for any Multiple Attribute Set targets in the MASG, and the common attributes for the MASG are for both the MASG and these MAS targets.]

However, the taxon group does not have to be named.  Here is part of the Carex section Ovales key for species  east of the Rocky Mountains:
Both couplet 56 and couplet 57 have two attribute sets leading to them. That is, there are two paths leading to the taxon group of Carex opaca and C. shinnersii, and there are two paths leading to the taxon group of C. opaca, C. shinnersii and C. missourieusis. In the FNA online key, special notation indicates couplets 56 and 57 are exceptional; in particular, instead of the standard (57) to indicate the next couplet is 57, the phrase "Go to couplet 57" is used for one of the attribute sets.

How a connector node is used with a target taxon that has multiple attribute sets is described in Node Structure and Types. A connector node is also needed for a couplet with multiple attribute sets leading to taxa group.

The fact that couplets 56 is associated with the C. opaca - C. shinnersii taxon group (and likewise, couplet 57 is associated with a taxon group) is stored as intermediate name in the couplet node (see Node Structure and Types).  The intermediate name in a couplet node is also sometimes used to store an intermediate rank that is associated with the couplet.

Node Groups & Relationships

Saturday, December 21, 2013
Nodes (see Node Structure and Types) form node groups, which are then connected to create a database model for the FNA keys.  There are three group types

However, they share generalized group elements.  In particular, the End Nodes for both have the same dual role that is described in Node Structure and Types. To Do: Need a diagram like on 1/11/2015 in order to include the 2 types of connector nodes and to allow for 2 layers of connector nodes - use the real diagram for Salicaceae Populus on 12/30/2015.  Need a diagram like on 1/28/2015 to show the all-inclusive sequence of nodes.  Also need to show that multiple-attribute-set connector nodes can be both external and internal to the key.

Node Group Elements -
  1. Base Node
    • The Base Node provides the parent taxon name or intermediate rank name; all targets in the group are members of this taxon or intermediate rank.
    • If the Couplet Nodes immediately follow the Base Node, it provides the base node number for the group; this node number is used to convert the row numbers of the Couplet Nodes to the node numbers used in the database. This is the case if the Base Node is a root, connector or singleton node.
    • If the Couplet Nodes are separate from the Base Node, then the base node number is one less than the first couplet node number. This is the case if the Base Node is a target node (or a couplet node with an intermediate title) in a higher level node, so that node has dual roles: it is a target node in the higher level node group and Base Node for the current node group.
    • If the Base Node is a singleton parent, then there are no Couplet Nodes and the singleton node is the single End Node and immediately follows the Base Node. In this case the Base Node can be a root, connector, target with an intermediate title or a singleton node. Note that if the Base Node is a singleton node, it also acts as a singleton parent.
  2. Couplet Nodes
    • Singleton groups have no couplet nodes.
    • Key groups have zero or more couplet nodes. With no couplet nodes, there are two target nodes. With one or more couplet nodes, the end nodes can be target or attribute set nodes.  There are special cases where a target node can be replaced by a segregate node.
    • If the end nodes are attribute set nodes, then associated connector node may specify a couplet node that acts as a merge point; that is, all attributes between the Base Node and the merge point are used in common with with all attribute sets associated with that connector node.
    • If there are subkeys, the master key and the subkeys are separate node groups. They are joined together by couplet or connector nodes with an intermediate titles.
  3. End Nodes
    • As discussed under Base Node, the End Node could be a singleton node; in all other cases, there is always more than one End Node.
    • For target End Nodes, the target node also acts as a Base Node for the node at the next level.
    • Attribute set nodes exist when a taxon has more than one attribute set that describe that taxon, so require an associated connector node for the taxon, which acts as the Base Node for the node at the next level.
    • In each of the three End-Node cases above, instead of the End Node being the Base Node for the next level, the End Node can be a terminal taxon.
Node Creation Process - From a high level, our objective is to model the FNA using these objects:
  • Root nodes
  • Keys
  • Connector nodes
  • Alternate Key nodes
  • Singletons
In more detail, all nodes can be created sequentially using the following process:
  1. For each family:
    1. Create the family root node, which acts as a base node
    2. If level has key(s)
      1. For the key (or for each alternate key):
        1. Create a multiple-attribute-set connector node for each file in the multiSpcSubclasses directory
        2. If this level has alternate keys, create an alternate key node for each key, which serves as the base node and supplies the name for the alternate key
        3. For each row in the key, create one of these node types:
          • Couplet
          • Target
          • Attribute Set
          • Segregate
      2. If this level has alternate keys, create an alternate-key connector node for each target (both single and multiple-attribute-set targets)
    3. Else (level has a singleton)
      1. Create a singleton node, which acts as a base node for the next level
    4. Repeat B. or C. for each level, but there are no alternate keys at the species and subsp levels
Therefore, to create the Node List, one starts with the root for each family and works through each level, adding keys, connector nodes, alternate key nodes and singletons, where for each key a node is added for each row. The Node List is numbered using the class list index (cli) with values 0, 1, 2, ... (see diagram in Node Structure and Types).

Additional Nodes for Alternate Keys - Six alternate keys were enumerated in Key Types and Subkeys. There must be a separate key base node for each alternative since each has separate child nodes.  Also there must be separate target, attribute set and connector nodes for each alternative key since the parent(s) of each of of these nodes is different. Finally, each target or connector node needs an alternate-key connector node, in order to distinguish the different attribute sets from the paths through each alternate key.

The node list and node numbering must take these additional nodes into account. After the user chooses which alternate key they want, this determines which of alternate nodes are to be used.

Node Structure and Types

Thursday, September 19, 2013
Nodes and Tags - The data in the FNA keys can be modeled using nodes, each of which can have
  • Parents and children
  • Tags (these can contain just values or can hold key-value pairs)
Previously instead of the term "node", I used the term "class" (from object oriented programming), which is still used in some places. [Globally change "class" to "node", so instead of "class number" use "node number" and instead of cli use ni.  Relative to a specific key, use "row number" (and "couplet number" as used in FNA), but after concatenating keys for a family, row number is converted to node number by adding the base for each key and node numbers for singletons and connector nodes are included (after concatenating keys, couplet numbering for a family is not meaningful).]

Node Structure - The following shows the structure of the data within each node:

[Modify to add Target Subset Index (pointer to a class subset for all keys that has the key's target list; these are only used in the key's base node or, if there are alternate keys, in the alternate keys' base node, which are shown in the Node Groups & Relationships figures), and a tag for taxa that are base classes for keys showing special types like targetsInKey (see 5/27/2015).] [Target Subset Index (tsi) creates a "naming hierarchy", which is usually the same as the "keying hierarchy" (which uses sbli and spli), but show for Piperia (see 8/11/2015) it is different, so need both hierarchies.] [The diagrams on 1/11/2016 show 4 cases, from the simplest to most complex case, that can occur (the 4 diagrams show, for concreteness, genus level on top, species level in the middle and subsp. level on the bottom):

  1. The normal and most common case where the naming and keying hierarchies are the same (except the keying hierarchy is much longer because all the couplets are in the path).
  2. For example, look at the Piperia key, where all the subsp. for elegans appear in the Piperia key and elegans has no subsp. key. The naming hierarchy for the elegans subsp. exists in the tsi from Piperia to elegans and the tsi from elegans to each subsp. These are different than the keying-hierarchy paths through the Piperia key to the subsp. nodes.
  3. There is no example of this, but some subsp. for a given species could be in the species key, while the other subsp. for that species could be in their own key, which is its normal place in the naming and keying hierarchy.
  4. This case is similar to 3., but the subsp. that is in the species key is also in the subsp. key for the given species, which means that the subsp. has multiple attribute sets, one path from the genus to the subsp. and another path from the genus to the species, then to the subsp.  An example of this is Ranunculus hispidus var. nitidus, which has one attribute set in the Ranunculus key and another in the R. hispidus key.]

Node Types - The node type is specified in the tagListType field of each ClassEntry.  The node type descriptions below need to be read in conjunction with understanding how nodes are related, which is described in Node Groups & Relationships and in the Dual Role of Nodes section below.
  • Root - This is the base for each family, which contains the family name.  Most are base nodes for each family key, but a few families have only one species, so no key is involved and the root is a singleton parent.  Besides the family name, the node contains the unique taxon id assigned by FNA; also the FNA numbers each family (currently from 1 to 128), so this number is also contained in the root node (as the Target # in the table below).  Each family root node is level 1 in the taxon hierarchy. If a root node is the base node for a family key, the children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key.  This is not the case if there are alternate keys for the family; instead the root node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user.
  • Couplet - Couplet nodes document the decision points in the keys (see A Key as a Hierarchy).  The node has pointers to each of the two choices of attributes, but those attributes themselves are not contained in this node.  However, in this node is the attribute(s) for the choice that led to this couplet.  The level of this node is the same as that of the base node for this key.  As mentioned in Key Types and Subkeys, some keys have subkeys, each of which is given a name with a number or letter or with an intermediate taxon rank; for subkey couplet nodes, this name is stored with the couplet node (other types of couplet nodes do not have a name). An intermediate rank can also be associated with a choice that points to another couplet rather than to a subkey. Besides pointing to another couplet node, a couplet node can point to two types of target nodes, which are described next.
  • Target - This is the destination taxon, which was arrived at by a unique sequence of choices in the key; that is, there is a single attribute set in the key that has this taxon as its target.  This taxon will be at the next level compared to the level of the base node for this key.  In this node is the final attribute choice that led to this target.  Also in this node is the taxon name and number relative to the taxon in the key's base node; and there is a unique taxon id assigned by FNA. If a target node is the base node for a key, the node's children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key. This is not the case if there are alternate keys at this level; instead the target node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user. There is a rather special case where the target is an intermediate taxon that only has one child, so the intermediate name has to be stored with this type of target node as well as the taxon name of the child (this should not be called a subkey singleton because it is not a key and is not a singleton node, as used below). [To Do: Either (1) target nodes are one level above base & intermediate nodes only associated with couplets or (2) add additional nodes that show all immediate children of intermediate nodes, so can show hierarchy.]
  • Attribute Set - This node corresponds to one attribute set for a destination taxon that has multiple attribute sets (an attribute set is a sequence through the key leading to a target taxon; multiple-attribute-set targets were described in A Key as a Hierarchy). This node is similar to a target node, but since there are multiple paths to this destination taxon, there is also a special connector node that each of these attribute set nodes point to. Appended to the taxon name is an attribute set number in order to make the node name unique. There is no relative target number and taxon id since these are in the connector node.  Note that there can also be multiple attribute sets leading to subkeys, so in this case instead of a taxon name, the name is an intermediate taxon name.
  • Multiple-Attribute-Set Connector - This node is the destination taxon node pointed to by each of the multiple-attribute-set nodes; this connector node points back to each of those nodes (so to follow a path to the base node requires knowing which of the attributes sets was selected). For an example of a connector node see Effect of Single and Multiple Parents on Next-Level Keys in A Key as a Hierarchy. Like a target node, this node has the taxon name, relative target number and taxon id.  In addition, this node has the merge point for all of the attribute set paths; that is, all attribute sets have common attributes between the merge point node and the base node for the key; in some cases, the merge point and the base node are the same (there are no common attributes).  If the attribute set nodes are for an intermediate taxon, this connector node will be for that intermediate taxon also.  [Probably want to split this into Multiple-Attribute-Set Target Connector and Multiple-Attribute-Set Group Connector (or External and Internal Multiple-Attribute-Set Connectors) because in the table below Multiple-Attribute-Set Group Connector does not need Taxon Id and Target #.]
  • Alternate-Key Connector -  This node is needed whenever there are alternate keys (in addition to the alternate key node, which is discussed next).  In this case, a connector node is needed for each target in the alternate keys, so that as couplet choices are made that traverse down through any target, which of the alternate keys was used can be saved and when traversing back up through that target, the same alternate key is used in creating the attribute list for the terminal target. If there are multiple-attribute-set nodes in an alternate key, a multiple-attribute-set connector node is needed for each set, and each of those connector nodes is connected by an alternate-key connector node.
  • Alternate Key - An alternate key node connects a root or target node to all available alternate keys for the family or target. Currently only eleven alternate key nodes are needed, which are listed in Key Types and Subkeys. Each alternate key node is the base nodes for the corresponding key. Each is named so that the parent can offer a choice to the user which of the alternate keys they want to use.
  • Singleton - This node represents a singleton node, which results when a parent node only has one child that is at the next level, so no key is involved.  A root node, a target node or a connector node can be a singleton parent node.  A singleton node contains the taxon name, relative target number and taxon id.
  • Missing-Key Target - If a taxon has children, but the key to distinguish the children from each other is missing, then they can still be listed under that taxon.  For those that are not terminal, then lower keys may exist, so that a hierarchy can be shown with a gap for the missing key. A missing-key target is also used when a taxon is keyed at a level that is not normal for that taxon; e.g., there is no need for a subsp. key for Piperia elegans because all these subspecies are in the key for species at the Piperia level, but a missing-key target node is needed for Piperia elegans to show the hierarchy of names from Piperia to elegans and then to its two subspecies.
  • Segregate - A target in a key that is where a taxon used to exist before FNA moved it to a new location in the hierarchy. A link gives that new location, so that keying of a specimen can be resumed from there.
The following summarized the data contained in each type of node:

Node Type Name Level Row Label Attribute(s) Taxon Id Target # Merge Point Intermediate Name


Attribute Set

Multiple-Attribute-Set Connector

Multiple-Attribute-Set-Group Connector

Alternate-Key Connector

Alternate Key


Missing-Key Target


See Node Groups & Relationships for a description of how nodes are combined to model the FNA keys.

Dual Role of Nodes - The node type reflects the role of a node relative to its parent or to other nodes in a key.  Root, target, connector and singleton nodes also have another role relative to nodes at the next level; these roles are either
  • Couplet
  • Singleton Parent
  • Key Base
  • Offer alternate key choices
  • Terminal
These are not separate nodes, but nodes at the previous level acting in their other role.  In Node Groups & Relationships the more general term Base Node is used for Singleton Parent or Key Base.

Tag List Creation - The following are general steps in creation of a tag list and tagListIndex that is used in a class entry:

  1. Create a new Tag object.
  2. Add that object to the tags array list, keeping track of the tag index.
  3. Create a new TagList object, using the tag index. keeping track of the tag list index.
  4. Get a reference to that TagList object.
  5. For each addition tag needed, create a new Tag object.
  6. Add that object to the tags array list, keeping track of the tag index.
  7. Add that tag index to the TagList object.
  8. Insert the tag list index obtained in step 3. in the class entry.

A Key as a Hierarchy

Tuesday, September 17, 2013

This section shows the structure of the FNA keys is a hierarchy, at least conceptually; the structure of the database used to store those keys is explained later in Data Structure and Node Types & Relationships.

Targets with a Single Parent - A couplet is a parent-child relationship with two children:

As mentioned in Couplets, Targets and Rows, each attribute choice leads to either another couplet or to a target. 

If all targets in a key have a single parent, then there is a unique path between the root of the key (a family, genus, species or subspecies) to each target usually one level down from the root.  That is, from each target one can unambiguously traverse back up to the root of the hierarchy.

Targets with Multiple Parents - Unfortunately, as seen in the Attribute Sets section, some targets in FNA keys can have more than one attribute set, which results in multiple parents for those targets.

For example, the paths to the target Boechera repanda shown in Couplets, Targets and Rows can be redrawn as:

That is, depending on which path is chosen, a different attribute set results.  When the target is terminal, as is the case for B. repanda, visualizing the multiple paths to the root is relatively easy.

Effect of Single and Multiple Parents on Next-Level Keys - How keys are connected to the key at the next level is different for targets with single and multiple parents; that is, with targets with single and multiple attribute sets. What follows is an example of each case.

Silene is a target in the Caryophyllaceae key with a single attribute set:
The Silene key has the following connections to the Caryophyllaceae key:
The row numbers in this diagram are relative.  In the next section Key Types and Subkeys renumbering of rows in subkeys is discussed. Row numbers in subkey diagrams in this section are shown before renumbering.

What is important is that the parent for the initial couplet in the Seline key is in the Caryophyllaceae key, so the logic to set up this parent-child relationship is different than if they were in the same key. Row 37 is the Silene target, but it also has a next couplet that is the first couplet in the Silene subkey. For the single parent case there is no zero row representing the root for the Silene key; that is, the row zero shown in the key diagram in Couplets, Targets and Rows is not needed in this case.

S. scoleri is a target in the Silene key with two attribute sets:
By creating a special node that acts as a target for rows 118 and 143 in the Silene key and as a couplet for rows 1 and 2 in the S. scolari key, all other elements in the Silene and S. scolari keys can be treated as normal parent-child relationships.:
The Node Structure and Types post discusses how key hierarchies, including these special multiple-attribute-set targets, are represented as data structures.

[Add a short section referring to the Target Subset Index in Node Structure and Types that creates a naming hierarchy, which may deviate from the attributes hierarchy that is defined by the keys.]

Key Types and Subkeys

Thursday, September 5, 2013
The number of keys in FNA is greater than the 1853 keyed taxa mentioned in Taxon List & Counts by Type.  One reason is five taxa have alternate keys:
  • Asteraceae:   Synoptic and Artificial Keys
  • Brassicaceae:  Flower-Based and Fruit-Based Keys
  • Lauraceae:  Flower-Based and Fruit-Based Keys
  • Portulacaceae Portulaca:  Flower-Based and Fruit-Based Keys
  • Salicaceae Populus:  Flower-Based, Fruit-Based and Leaf-Based Keys
Because the alternate keys add new attributes and new paths through the keys, each must be handled separately (see Node Structure and Types for the mechanism to choose which alternate key to use and see Node Groups and Relationships for the effect on node numbering).

Adding the six alternate keys, there are 1859 keys.  The other reason is 43 of these keys have subkeys.  Here is the count of FNA keys by type:

Keys with no subkeys 1816
Keys with subkeys 43
Subkeys 347
Keys that are not used 5
Total keys 2211

My list of taxa with multiple keys, with their alternate keys, master key and subkeys, is here.

From this list, you see two types of keys with subkeys:
  • Targets in these keys have intermediate levels, each with its own subkey.
  • The keys are subdivided into subkeys, each of which is given a number or letter; although there is no scientific name, each of these subkeys correspond to an intermediate level.
Some taxon pages for these keys have a List of Keys, which contains the master key and subkeys. More commonly, keys for intermediate levels are included in the list of lower taxa. This method of listing allows for intermediate levels with only one child, so these are not subkeys.

Three of the keys that are not used are on the pages' List of Keys, but that list points to pages that have blank keys.  The other two are keys to intermediate levels that are not used in the keys for their respective taxa.

The Couplets, Targets and Rows post discusses use of couplet and row numbers in keys.  The numbering of couplets and rows in subkeys also start with 1.  But to use a taxon key that has subkeys, the master key and all subkeys must be concatenated into one key.  The numbering of couplets and rows in the first subkey must be renumbered to continue where numbering left off in the master key, and likewise as each subkey is added.  The couplet and row numbers shown in the Boechera repanda paths diagram in Couplets, Targets and Rows are those after this renumbering.

Taxon Locations

Friday, August 30, 2013
I've posted two spreadsheets that show state or province postal codes for all FNA terminal taxa.  Due to a Google Drive spreadsheet size limitation, these data were split:  Alberta through North Dakota and Nebraska through Yukon Territory.  Taxon counts by location are in the last row.

Terminal taxa  in my taxon list (see Taxon List & Counts by Type) have locations shown on their FNA taxon page.  The 11,200 terminal taxa that have their own taxon page have the following location information:

Location directly shown on taxon page 11,148
Location indirectly available 50
Location information not available 2

Indirect location information is most commonly found from a distribution map.  One of the taxa (Iris germanica) with no location information says it "may persist after cultivation"; the other (Corispermum nitidum) says it was "supposedly introduced but doubtful".  Location information for the few taxon without their own page has not been added yet.

The 65 North American locations used in the FNA are the 49 continental states and District of Columbia in the USA, the 13 provinces and territories of Canada, Greenland which is an autonomous country within Denmark, and the islands of St. Pierre and Miquelon which is a self-governing territory of France.

In the future, if a user of the FNA keys selects the location(s) that they are interested in, then the keys can be customized.  Then the keys would only show those couplets and terminal taxa relevant to their location(s), which greatly simplifies keying a specimen. Also the keys can be simplified because there are location-specific couplets in the keys; these couplets may eliminate all except one attribute set leading to the terminal taxa.

Couplets, Targets and Rows

Friday, August 23, 2013
In each FNA key for a given taxon, couplets are numbered starting with 1.  A couplet gives a choice of two alternate attributes:  the first choice is labeled with the couplet number and the second is labeled "+".  Following each choice where to go next in the key is indicated by a couplet number.  However instead of a next couplet, one or both choices may lead to a destination target.

A diagram showing the hierarchy of couplet numbers is a simple way to show the overall structure of a key.  For example, the following diagram shows the structure of the Caryophyllaceae Drymaria key using couplet numbers and destination targets:
Each choice in a couplet has one or more attributes that are added to the target's attribute set.

When creating a target's attribute set, it is easier to use the row numbers as an index into the key, instead of using couplet numbers. The keys are designed such that the row number can be obtained from the couplet number:
Couplet number Row number for current couplet
Using first choice Using second choice
n 2n - 1 2n
In other words, the row number is double the couplet number, except for the first choice, the row number is one less than this.  See the last diagram in this section as an example of how row numbers (in circles) are related to couplet numbers; note in that diagram, the row number of the first choice in a couplet is put to the left of the couplet, and the second choice is put to the right.

Two FNA keys use a special structure exception that must be handled correctly so that the above relationship between couplet and row numbers is maintained.  In particular, the Ranunculus and Piperia keys to species have targets that are, instead, subspecies. [Need to a pointer to some later post on implementation details that has a diagram showing an example of having subspecies target nodes and special species nodes added after the key nodes to connect these together: e.g. diagram in 8/26/15 notes.]

This is a diagram of the Caryophyllaceae Drymaria key using row numbers:
A row number 0 (representing the root of the Drymaria key) has been added as a starting point.  From this diagram, you can create the path to any row in the key or to any target; in particular, you can create the attribute set for any target.  Also it is useful to create a mapping of any row to its parent row.

Except these uses, a row diagram of a key has disadvantages:   it is harder to create and understand; so a couplet diagram is used most often.  A diagram that has both may be a better choice when row numbers are needed.  For example, the following diagram for the Brassicaceae Boechera key can be used to create the paths to Boechera repanda, Attribute Sets 1 and 2:

With the row numbers, you can create the lists of the common attributes and specific attributes for sets 1 and 2, which are shown in the Attribute Sets post.  How the subkeys in the Boechera key are combined is discussed at the end of the Key Types and Subkeys post; in particular, the Group 1 subkey is used in this diagram.

Attribute Sets

Tuesday, August 20, 2013
As one makes binary choices of attributes (also called characteristics) in a key, an attribute set for the destination target is created.  Actually it is more useful to reverse the set order, so those attributes chosen last are listed first because those are are most specific to the target.

One output of my analysis will be spreadsheets containing attribute sets for each target in all FNA keys. For a start, I've posted spreadsheets for family Caryophyllaceae, for genus Drymaria, for genus Silene and for Brassicaceae Boechera. Examples of attribute sets from these spreadsheets are below.

An example of a simple case is the attribute set for Boechera burkii:
  • Racemes usually unbranched; cauline leaves 18-28; ovules 64-80 per ovary; seeds 1.2-1.4 mm wide
  • Biennials, without caudices; stems (2-)3-10 dm; cauline leaves 18-80
  • Basal leaf blade surfaces glabrous or with simple trichomes only
  • Cauline leaf blades not auriculate
A diverse target whose name is appended with "(in part)" in the key have more than one attribute set. In column 2 of the four spreadsheets above, I name these "Attribute Set n" where n=1,2,…; that is, there are multiple attribute sets. Using Boechera repanda as an example, Attribute Set 1 is:
  • Basal leaf blades 10-25(-50) mm wide; petals 3.5-6 mm.
  • Fruit valves pubescent
And Attribute Set 2 is:
  • Basal leaf blades 7-25(-50) mm wide, margins usually repand to dentate, rarely entire
  • Fruits 2-5.5 mm wide, divaricate-ascending to erect, ± appressed to rachises
  • Stems proximally with simple and/or branched trichomes usually less than 0.5 mm; ovules 8-52 per ovary
  • Fruits not secund
  • Fruits erect, ascending, or horizontal
  • Fruit valves glabrous
If a target has multiple attribute sets, then the targets may have the same general attributes, and only the more specific attributes differ. Therefore, I list these general Common Attributes for Boechera repanda separately in the spreadsheet:
  • Plants usually sparsely to densely pubescent proximally (sometimes throughout)
  • Styles 0.05-2 mm
  • Basal leaf blade surfaces with at least some branched trichomes
  • Cauline leaf blades not auriculate
Toward the bottom of the Couplets, Targets and Rows post, a diagram shows how row numbers and a path through the Boechera repanda key are used to create these attribute set lists.

In Taxon List & Counts by Type the usual taxon levels (family, genus, species, subspecies and variations) were mentioned.  For a given taxon, there will be an attribute set for each target level: genus, species and possibly subspecies and/or variations.  At each of these levels, there could be multiple attributes sets.

As mentioned in Key Types and Subkeys, some taxa have keys with subkeys; each of these subkeys, which correspond to targets at intermediate levels, add an attribute set or multiple attribute sets.  As shown in Multiple-Attribute-Set Groups, a couplet can have multiple attribute sets even if there is no subkey, although this is rare.


Monday, August 12, 2013
The Flora of North America North of Mexico is being created by over 800 authors in an online version and in 30 printed volumes; this is usually called just Flora of North America (FNA).  The FNA contains dichotomous keys, which allow biologists and serious amateurs to identify nearly all known plants in North America.

Using the online keys, my objective is to analyze the structure of those keys, looking for commonality. Over time, I will add details of this analysis.

Data created by my analysis is too large to be contained in this blog, so this blog has links to Google Docs spreadsheets.  A Google account or a gmail account is required to access these spreadsheets; if you don't have an account, you can go here to create one.


      Taxon List and Counts by Type
      Taxon Locations
      Attribute Sets
      Couplets, Targets and Rows
      A Key as a Hierarchy
      Key Types and Subkeys
      Multiple-Attribute-Set Groups
      Node Structure and Types
      Node Groups and Relationships

Taxon List & Counts by Type

Sunday, August 11, 2013
My list of all taxa created by analyzing the online version of the FNA is here.  The types shown in the third and forth columns are explained below.

Given that the plant family has already been identified, the FNA keys allow more and more specific identification of taxon level, ultimately leading to a terminal taxon.  The number of taxa at the different levels are as follows:

Level Type within Level Number of Taxa
family family 182
genus genus 1505
genus subclasses species 9562
hybrid 42
species subclasses subsp. 841
var. 1845
hybrid 1
subsp. subclasses var. 13
Total in FNA keys 13,991

[under species subclasses, instead of hybrid, "var. x" or "nothovar", to cover:
Asteraceae Petasites frigidus var.×vitifolius
Rosaceae Crataegus ×sicca nothovar. sicca
Rosaceae Crataegus ×sicca nothovar. glabrifolia 
See https://en.wikipedia.org/wiki/Hybrid_name ]

Non-Terminal Taxa - Taxa that are non-terminal (families, genera and some species) are of two types (see the third column of my taxon list):

keyed A key is used to differentiate subclasses 1853
singleton parent There is only one subclass (no key) 923

[Add a third type: implied.  Children exist in a key.  A key to species may also have subspecies targets in the key when the parent of those subspecies is not in the key:  the existence of the subspecies implies the existence of the species. Relate to the A-Key-as-a-Hierarchy to-be-added diagram of an example naming hierarchy that deviates from the attributes hierarchy, which is derived from the keys.]

Keyed taxa will be discussed further in Key Types and Subkeys.

Terminal Taxa - Terminal taxa are also indicated in the third column of my taxon list.  In a few cases, destination targets in the FNA keys are not listed in the site's taxa lists.  These targets are distinguished from other terminal targets:

terminating A target that has a unique taxon_id 11,200
terminatingOnAssocTaxon A target that is only described on the page of an associated taxon 10
terminatingOnBadTaxon A target with no taxon page or with a bad taxon_id 5

Not Reachable Taxa - Independent of the above types, some taxa may not be reachable because some key is missing or incomplete (see the forth column of my taxon list):

Reachable Keys or singletons can be used to determine taxon 13,790
Not reachable Taxon not reachable given the family 201

Only two taxa that are not reachable are not targets in any key, so this is mainly caused by a missing or incomplete intermediate key.

Types of Reachable Terminal Taxa - Taking into account reachability, the FNA keys can be used to differentiate the following types of destination targets:

hybrid 43
species 8338
subsp. 797
var. 1852
Total Reachable Terminal Taxa 11,030