<body>

Multiple-Attribute-Set Groups

Monday, February 10, 2014
Previously we showed some diverse taxa that have multiple attribute sets terminating on a taxon. That is, there is more than one path to the target taxon.  In some of the more complex keys, there may be more than one path to a group of taxa.  There are two cases where this occurs.

If a key has subkeys, there may be more than one path to a subkey.  For example, Cyperaceae Carex has subkeys Key A through Key F, and each of these have subkeys that are intermediate ranks.  In particular, one of the subkeys for Key C is section Ovales, and there are three attribute sets leading to the group of species in section Ovales:
This is similar to multiple attribute sets leading to the key for a target taxon (for example, Silene scolari shown in A Key as a Hierarchy).

However, the taxon group does not have to be named.  Here is part of the Carex section Ovales key for species  east of the Rocky Mountains:
Both couplet 56 and couplet 57 have two attribute sets leading to them. That is, there are two paths leading to the taxon group of Carex opaca and C. shinnersii, and there are two paths leading to the taxon group of C. opaca, C. shinnersii and C. missourieusis. In the FNA online key, special notation indicates couplets 56 and 57 are exceptional; in particular, instead of the standard (57) to indicate the next couplet is 57, the phrase "Go to couplet 57" is used for one of the attribute sets.

How a connector node is used with a target taxon that has multiple attribute sets is described in Node Structure and Types. A connector node is also needed for a couplet with multiple attribute sets leading to taxa group.

The fact that couplets 56 is associated with the C. opaca - C. shinnersii taxon group (and likewise, couplet 57 is associated with a taxon group) is stored as intermediate name in the couplet node (see Node Structure and Types).  The intermediate name in a couplet node is also sometimes used to store an intermediate rank that is associated with the couplet.

Node Groups & Relationships

Saturday, December 21, 2013
Nodes (see Node Structure and Types) are used to form node groups, which are used to create the database model of the FNA keys.  There are only two group types
  1. Key groups
  2. Singleton groups
but they have enough in common, so that both can be considered to in terms of a more general node group.
Node Groups - The different types of nodes can be grouped together in only certain ways.  By defining a generalized node group and understanding how node groups can be connected, then all FNA keys can be modeled by a series of these generalized node groups.
A node group has three parts:
  1. Base Node
    • Provides the parent taxon name or intermediate rank name; that is, all targets in the group are members of this taxon or intermediate rank.
    • If the Key Nodes immediately follow the Base Node, it provides the base node number for the group; that is, it is used to convert the row numbers of the Key Nodes to the node numbers used in the database. This is the case if the Base Node is a root, connector or singleton node.
    • If the Key Nodes are separate from the Base Node, then the base node number is one less than the first key node number. This is the case if the Base Node is a target node (or a couplet node with an intermediate title) in a higher level node, so that node has two roles in two groups: it is a target node in the higher level node group and Base Node for the current node group.
    • If the Base Node is a singleton parent, then there are no Key Nodes and the singleton node is the single End Node and immediately follows the Base Node. In this case the Base Node can be a root, connector, target with an intermediate title or singleton node.  Note that if the Base Node is a singleton node, it also acts as a singleton parent.
  2. Key Nodes
    • All Key Nodes are couplet nodes.
    • Except in the case of singleton groups, there is always more than one couplet node.
    • The children of the Base Node are always always the Key Nodes that have relative node numbers 1 and 2.
    • Two or more of the Key Nodes have target or attribute set nodes as one or both of their children; both target and attribute set nodes can act as End Nodes.
    • If the End Nodes are attribute set nodes, then each associated connector node may specify a couplet node that acts as a merge point; that is, all attributes between the Base Node and the merge point are used in common with with all attribute sets associated with that connector node.
    • If there are subkeys, the master key and the subkeys are separate node groups. They are joined together by couplet or connector nodes with an intermediate titles.
  3. End Nodes
    • As discussed under Base Node, the End Node could be a singleton node; in all other cases, there is always more than one End Node.
    • For target End Nodes, the target node also acts as a Base Node for the node at the next level.
    • Attribute set nodes exist when a taxon has more than one attribute set that describe that taxon, so require an associated connector node for the taxon, which acts as the Base Node for the node at the next level.
    • In each of the three End-Node cases above, instead of the End Node being the Base Node for the next level, the End Node could be a terminal taxon.




Node Structure and Types

Thursday, September 19, 2013
Nodes and Tags - The data in the FNA keys can be modeled using nodes, each of which can have
  • Parents and children
  • Tags (these can contain just values or can hold key-value pairs)
 Previously instead of the term "node", I used the term "class" (from object oriented programming), which is still used in some places.

Node Structure - The following shows the structure of the data within each node:


Node Types - The following describe the types of nodes, which is specified in the tagListType field of each ClassEntry :
  • Root - The base for each family, which contains the family name.  Most are base nodes for each family key, but a few families have only one species, so no key is involved and the root is a singleton parent.  Besides the family name, the node contains the unique taxon id assigned by FNA; the FNA numbers each family from 1 to 128, so this number is also contained in the root node.  Each family root node is level 1 in the taxon hierarchy.
  • Couplet - The couplets at each decision points in the keys (see A Key as a Hierarchy).  The node has pointers to each of the two choices of attributes, but those attributes themselves are not contained in this node.  However, in this node is the attribute(s) for the choice that led to this couplet.  The level of this node is the same as that of the base node for this key.  As mentioned in Key Types and Subkeys, some keys have subkeys, each of which is given a name with a number or letter or with an intermediate taxon rank; for subkey couplet nodes, this name is stored with the couplet node (other types of couplet nodes do not have a name). An intermediate rank can also be associated with a choice that points to another couplet rather than to a subkey. Besides pointing to another couplet node, a couplet node can point to two types of target nodes, which are described next.
  • Target - The destination taxon, which was arrived at by a unique sequence of choices in the key; that is, there is a single attribute set in the key that has this taxon as its target.  This taxon will be at the next level compared to the level of the base node for this key.  In this node is the final attribute choice that led to this target.  Also in this node is the taxon name and number relative to the taxon in the key's base node; and there is a unique taxon id assigned by FNA. There is a rather special case where the target is an intermediate taxon that only has one child, so the intermediate name has to be stored with this type of target node as well as the taxon name of the child (this should not be called a subkey singleton because it is not a key and is not a singleton node, as used below). [Either (1) target nodes are one level above base & intermediate nodes only associated with couplets or (2) add additional nodes that show all immediate children of intermediate nodes, so can show hierarchy.]
  • Attribute Set - One attribute set for a destination taxon that has multiple attribute sets (sequences through the key leading to this taxon; multiple-attribute-set targets were described in A Key as a Hierarchy). This node is similar to a target node, but since there are multiple target nodes for this destination taxon, there is also a special connector node that each of these target nodes point to. Appended to the taxon name is an attribute set number in order to distinguish the node.  There is no relative target number and taxon id since that is in the connector node.  Note that there can also be multiple attribute sets leading to subkeys, so instead of a taxon name, the name is an intermediate taxon name.
  • Connector - A connector node is the destination taxon node pointed to by each of the multiple-attribute-set nodes; this connector node points back to each of those nodes (so to follow a path to the base node requires which of the attributes sets to have been selected). For an example of a connector node see Effect of Single and Multiple Parents on Next-Level Keys in A Key as a Hierarchy. Like a target node, this node has the taxon name, relative target number and taxon id.  In addition, this node has the merge point for all of the attribute set paths; that is, all attribute sets have common attributes between the merge point node and the base node for the key; in some cases, the merge point and the base node are the same (there are no common attributes).  If the attribute set nodes are for an intermediate taxon, this connector node will be for that intermediate taxon also.
  • Singleton - The node representing a singleton node; this results when a parent node only has one child that is at the next level, so no key is involved.  A root node, a target node or a connector node can be a singleton parent node.  A singleton node has the taxon name, relative target number and taxon id.
  • Segregate - A target in a key that is where a taxon used to exist before it was moved to a new location in the hierarchy. A link gives that new location, so that keying of a specimen can be resumed from there.
The following summarized the data contained in each type of node:

Name Level Row Label Attribute(s) Taxon Id Target # Merge Point Intermediate Name
Root



Couplet


Target
Attribute Set



Connector


Singleton




See Node Groups & Relationships to see how nodes are used in a database to model the FNA keys.

Tag List Creation - The following are general steps in creation of a tag list and tagListIndex that is used in a class entry:

  1. Create a new Tag object.
  2. Add that object to the tags array list, keeping track of the tag index.
  3. Create a new TagList object, using the tag index. keeping track of the tag list index.
  4. Get a reference to that TagList object.
  5. For each addition tag needed, create a new Tag object.
  6. Add that object to the tags array list, keeping track of the tag index.
  7. Add that tag index to the TagList object.
  8. Insert the tag list index obtained in step 3. in the class entry.


A Key as a Hierarchy

Tuesday, September 17, 2013

This section shows the structure of the FNA keys is a hierarchy, at least conceptually; the structure of the database used to store those keys is explained later in Data Structure and Node Types & Relationships.

Targets with a Single Parent - A couplet is a parent-child relationship with two children:

As mentioned in Couplets, Targets and Rows, each attribute choice leads to either another couplet or to a target. 

If all targets in a key have a single parent, then there is a unique path between the root of the key (a family, genus, species or subspecies) to each target usually one level down from the root.  That is, from each target one can unambiguously traverse back up to the root of the hierarchy.

Targets with Multiple Parents - Unfortunately, as seen in the Attribute Sets section, some targets in FNA keys can have more than one attribute set, which results in multiple parents for those targets.

For example, the paths to the target Boechera repanda shown in Couplets, Targets and Rows can be redrawn as:

That is, depending on which path is chosen, a different attribute set results.  When the target is terminal, as is the case for B. repanda, visualizing the multiple paths to the root is relatively easy.

Effect of Single and Multiple Parents on Next-Level Keys - How keys are connected to the key at the next level is different for targets with single and multiple parents; that is, with targets with single and multiple attribute sets. What follows is an example of each case.

Silene is a target in the Caryophyllaceae key with a single attribute set:
The Silene key has the following connections to the Caryophyllaceae key:
The row numbers in this diagram are relative.  In the next section Key Types and Subkeys renumbering of rows in subkeys is discussed. Row numbers in subkey diagrams in this section are shown before renumbering.

What is important is that the parent for the initial couplet in the Seline key is in the Caryophyllaceae key, so the logic to set up this parent-child relationship is different than if they were in the same key. Row 37 is the Silene target, but it also has a next couplet that is the first couplet in the Silene subkey. For the single parent case there is no zero row representing the root for the Silene key; that is, the row zero shown in the key diagram in Couplets, Targets and Rows is not needed in this case.

S. scoleri is a target in the Silene key with two attribute sets:
By creating a special node that acts as a target for rows 118 and 143 in the Silene key and as a couplet for rows 1 and 2 in the S. scolari key, all other elements in the Silene and S. scolari keys can be treated as normal parent-child relationships.:
The Node Structure and Types post discusses how key hierarchies, including these special multiple-attribute-set targets, are represented as data structures.

Key Types and Subkeys

Thursday, September 5, 2013
The number of keys in FNA is greater than the 1853 keyed taxa mentioned in Taxon List & Counts by Type.  One reason is five taxa have alternate keys:
  • Asteraceae:   Synoptic and Artificial Keys
  • Brassicaceae:  Flower-Based and Fruit-Based Keys
  • Lauraceae:  Flower-Based and Fruit-Based Keys
  • Portulacaceae Portulaca:  Flower-Based and Fruit-Based Keys
  • Salicaceae Populus:  Flower-Based, Fruit-Based and Leaf-Based Keys
Adding these, there are 1859 keys.  The other reason is 43 of these keys have subkeys.  Here is the count of FNA keys by type:

Keys with no subkeys 1816
Keys with subkeys 43
Subkeys 347
Keys that are not used 5
Total keys 2211

My list of taxa with multiple keys, with their alternate keys, master key and subkeys, is here.

From this list, you see two types of keys with subkeys:
  • Targets in these keys have intermediate levels, each with its own subkey.
  • The keys are subdivided into subkeys, each of which is given a number or letter; although there is no scientific name, each of these subkeys correspond to an intermediate level.
Some taxon pages for these keys have a List of Keys, which contains the master key and subkeys. More commonly, keys for intermediate levels are included in the list of lower taxa. This method of listing allows for intermediate levels with only one child, so these are not subkeys.

Three of the keys that are not used are on the pages' List of Keys, but that list points to pages that have blank keys.  The other two are keys to intermediate levels that are not used in the keys for their respective taxa.

The Couplets, Targets and Rows post discusses use of couplet and row numbers in keys.  The numbering of couplets and rows in subkeys also start with 1.  But to use a taxon key that has subkeys, the master key and all subkeys must be concatenated into one key.  The numbering of couplets and rows in the first subkey must be renumbered to continue where numbering left off in the master key, and likewise as each subkey is added.  The row numbers shown in the Boechera repanda paths diagram in Couplets, Targets and Rows are those after this renumbering.

Taxon Locations

Friday, August 30, 2013
I've posted two spreadsheets that show state or province postal codes for all FNA terminal taxa.  Due to a Google Drive spreadsheet size limitation, these data were split:  Alberta through North Dakota and Nebraska through Yukon Territory.  Taxon counts by location are in the last row.

Terminal taxa  in my taxon list (see Taxon List & Counts by Type) have locations shown on their FNA taxon page.  The 11,200 terminal taxa that have their own taxon page have the following location information:

Location directly shown on taxon page 11,148
Location indirectly available 50
Location information not available 2

Indirect location information is most commonly found from a distribution map.  One of the taxa (Iris germanica) with no location information says it "may persist after cultivation"; the other (Corispermum nitidum) says it was "supposedly introduced but doubtful".  Location information for the few taxon without their own page has not been added yet.

The 65 North American locations used in the FNA are the 49 continental states and District of Columbia in the USA, the 13 provinces and territories of Canada, Greenland which is an autonomous country within Denmark, and the islands of St. Pierre and Miquelon which is a self-governing territory of France.

In the future, if a user of the FNA keys selects the location(s) that they are interested in, then the keys can be customized.  Then the keys would only show those couplets and terminal taxa relevant to their location(s), which greatly simplifies keying a specimen. Also the keys can be simplified because there are location-specific couplets in the keys; these couplets may eliminate all except one attribute set leading to the terminal taxa.

Couplets, Targets and Rows

Friday, August 23, 2013
In each FNA key for a given taxon, couplets are numbered starting with 1.  A couplet gives a choice of two alternate attributes:  the first choice is labeled with the couplet number and the second is labeled "+".  Following each choice where to go next in the key is indicated by a couplet number.  However instead of a next couplet, one or both choices may lead to a destination target.

A diagram showing the hierarchy of couplet numbers is a simple way to show the overall structure of a key.  For example, the following diagram shows the structure of the Caryophyllaceae Drymaria key using couplet numbers and destination targets:
Each choice in a couplet has one or more attributes that are added to the target's attribute set.

When creating a target's attribute set, it is easier to use the row numbers as an index into the key, instead of using couplet numbers. The keys are designed such that the row number can be obtained from the couplet number:
Couplet number Row number for current couplet
Using first choice Using second choice
n 2n - 1 2n
In other words, the row number is double the couplet number, except for the first choice, the row number is one less than this.  See the last diagram in this section as an example of how row numbers (in circles) are related to couplet numbers; note in that diagram, the row number of the first choice in a couplet is put to the left of the couplet, and the second choice is put to the right.

This is a diagram of the Caryophyllaceae Drymaria key using row numbers:
A row number 0 (representing the root of the Drymaria key) has been added as a starting point.  From this diagram, you can create the path to any row in the key or to any target; in particular, you can create the attribute set for any target.  Also it is useful to create a mapping of any row to its parent row.

Except these uses, a row diagram of a key has disadvantages:   it is harder to create and understand; so a couplet diagram is used most often.  A diagram that has both may be a better choice when row numbers are needed.  For example, the following diagram for the Brassicaceae Boechera key can be used to create the paths to Boechera repanda, Attribute Sets 1 and 2:

With the row numbers, you can create the lists of the common attributes and specific attributes for sets 1 and 2, which are shown in the Attribute Sets post.  How the subkeys in the Boechera key are combined is discussed at the end of the Key Types and Subkeys post; in particular, the Group 1 subkey is used in this diagram.

Attribute Sets

Tuesday, August 20, 2013
As one makes binary choices of attributes (also called characteristics) in a key, an attribute set for the destination target is created.  Actually it is more useful to reverse the set order, so those attributes chosen last are listed first because those are are most specific to the target.

One output of my analysis will be spreadsheets containing attribute sets for each target in all FNA keys. For a start, I've posted spreadsheets for family Caryophyllaceae, for genus Drymaria, for genus Silene and for Brassicaceae Boechera. Examples of attribute sets from these spreadsheets are below.

An example of a simple case is the attribute set for Boechera burkii:
  • Racemes usually unbranched; cauline leaves 18-28; ovules 64-80 per ovary; seeds 1.2-1.4 mm wide
  • Biennials, without caudices; stems (2-)3-10 dm; cauline leaves 18-80
  • Basal leaf blade surfaces glabrous or with simple trichomes only
  • Cauline leaf blades not auriculate
A diverse target whose name is appended with "(in part)" in the key have more than one attribute set. In column 2 of the four spreadsheets above, I name these "Attribute Set n" where n=1,2,…; that is, there are multiple attribute sets. Using Boechera repanda as an example, Attribute Set 1 is:
  • Basal leaf blades 10-25(-50) mm wide; petals 3.5-6 mm.
  • Fruit valves pubescent
And Attribute Set 2 is:
  • Basal leaf blades 7-25(-50) mm wide, margins usually repand to dentate, rarely entire
  • Fruits 2-5.5 mm wide, divaricate-ascending to erect, ± appressed to rachises
  • Stems proximally with simple and/or branched trichomes usually less than 0.5 mm; ovules 8-52 per ovary
  • Fruits not secund
  • Fruits erect, ascending, or horizontal
  • Fruit valves glabrous
If a target has multiple attribute sets, then the targets may have the same general attributes, and only the more specific attributes differ. Therefore, I list these general Common Attributes for Boechera repanda separately in the spreadsheet:
  • Plants usually sparsely to densely pubescent proximally (sometimes throughout)
  • Styles 0.05-2 mm
  • Basal leaf blade surfaces with at least some branched trichomes
  • Cauline leaf blades not auriculate
Toward the bottom of the Couplets, Targets and Rows post, a diagram shows how row numbers and a path through the Boechera repanda key are used to create these attribute set lists.

In Taxon List & Counts by Type the usual taxon levels (family, genus, species, subspecies and variations) were mentioned.  For a given taxon, there will be an attribute set for each target level: genus, species and possibly subspecies and/or variations.  At each of these levels, there could be multiple attributes sets.

As mentioned in Key Types and Subkeys, some taxa have keys with subkeys; each of these subkeys, which correspond to targets at intermediate levels, add an attribute set or multiple attribute sets.  As shown in Multiple-Attribute-Set Groups, a couplet can have multiple attribute sets even if there is no subkey, although this is rare.

Overview

Monday, August 12, 2013
The Flora of North America North of Mexico is being created by over 800 authors in an online version and in 30 printed volumes; this is usually called just Flora of North America (FNA).  The FNA contains dichotomous keys, which allow biologists and serious amateurs to identify nearly all known plants in North America.

Using the online keys, my objective is to analyze the structure of those keys, looking for commonality. Over time, I will add details of this analysis.

Data created by my analysis is too large to be contained in this blog, so this blog has links to Google Docs spreadsheets.  A Google account or a gmail account is required to access these spreadsheets; if you don't have an account, you can go here to create one.


Contents

      Overview
      Taxon List and Counts by Type
      Taxon Locations
      Attribute Sets
      Couplets, Targets and Rows
      A Key as a Hierarchy
      Key Types and Subkeys
      Multiple-Attribute-Set Groups
      Node Structure and Types
      Node Groups and Relationships

Taxon List & Counts by Type

Sunday, August 11, 2013
My list of all taxa created by analyzing the online version of the FNA is here.  The types shown in the third and forth columns are explained below.

Given that the plant family has already been identified, the FNA keys allow more and more specific identification of taxon level, ultimately leading to a terminal taxon.  The number of taxa at the different levels are as follows:

Level Type within Level Number of Taxa
family family 182
genus genus 1505
genus subclasses species 9562
hybrid 42
species subclasses subsp. 841
var. 1845
hybrid 1
subsp. subclasses var. 13
Total in FNA keys 13,991

Non-Terminal Taxa - Taxa that are non-terminal (families, genera and some species) are of two types (see the third column of my taxon list):

keyed A key is used to differentiate subclasses 1853
singleton parent There is only one subclass (no key) 923

Keyed taxa will be discussed further in Key Types and Subkeys.

Terminal Taxa - Terminal taxa are also indicated in the third column of my taxon list.  In a few cases, destination targets in the FNA keys are not listed in the site's taxa lists.  These targets are distinguished from other terminal targets:

terminating A target that has a unique taxon_id 11,200
terminatingOnAssocTaxon A target that is only described on the page of an associated taxon 10
terminatingOnBadTaxon A target with no taxon page or with a bad taxon_id 5

Not Reachable Taxa - Independent of the above types, some taxa may not be reachable because some key is missing or incomplete (see the forth column of my taxon list):

Reachable Keys or singletons can be used to determine taxon 13,790
Not reachable Taxon not reachable given the family 201

Only two taxa that are not reachable are not targets in any key, so this is mainly caused by a missing or incomplete intermediate key.

Types of Reachable Terminal Taxa - Taking into account reachability, the FNA keys can be used to differentiate the following types of destination targets:

hybrid 43
species 8338
subsp. 797
var. 1852
Total Reachable Terminal Taxa 11,030