Flora of North America Key Structure: September 2013

Node Structure and Types

Thursday, September 19, 2013

Nodes and Tags - The data in the FNA keys can be modeled using nodes, each of which can have

Parents and children
Tags that store values associated with the nodes

Node Structure - The following shows the structure of the data within each node:

The last three items in a node (the NodeEntry in the nodeList) are pointers to lists that contain a subset of other nodes. The parentList allows tracing each path back to the root node for a family. The childList allows tracing each path through each couplet in the key to all targets in the key; see A Key as a Hierarchy. The targetList also allow tracing each path to all nodes, but not through couplets; it also allows tracing to nodes for the rare cases where the nodes are only used for naming (see Naming Hierarchy) or nodes are not in any key, but are on the taxa list for a parent taxon.

[Add new node types: Naming Only - Used to provide names when the keying hierarchy does not follow the naming hierarchy. In most cases, taxa that are naming only are singleton parents, but they can be a key base node, such as Askellia in the Flora of the Pacific Northwest (see Subkeys and Concatenated Keys).]

[need consistent usage of "node type" and "taxon type". Taxon Type is called Taxon Id Type in the table below, and is found by findNodeType.sh (node type was my old name for taxon type). The list of Node Types is below.]

[Modify to add Target Subset Index (pointer to a class subset for all keys that has the key's target list; these are only used in the key's base node or, if there are alternate keys, in the alternate keys' base node, which are shown in the Node Groups & Relationships figures), and a tag for taxa that are base classes for keys showing special types like targetsInKey (see 5/27/2015).] [Target Subset Index (tsi) creates a "naming hierarchy", which is usually the same as the "keying hierarchy".]

Node Types - The node type is specified in the tagListType field of each ClassEntry. The node type descriptions below need to be read in conjunction with understanding how nodes are related, which is described in Node Groups & Relationships and in the Dual Role of Nodes section below.

Root - This is the base for each family, which contains the family name. Most are base nodes for each family key, but a few families have only one species, so no key is involved and the root is a singleton parent. Besides the family name, the node contains the unique taxon id assigned by FNA; also the FNA numbers each family (currently from 1 to 128), so this number is also contained in the root node (as the Target # in the table below). Each family root node is level 1 in the taxon hierarchy. If a root node is the base node for a family key, the children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key. This is not the case if there are alternate keys for the family; instead the root node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user.
Couplet - Couplet nodes document the decision points in the keys (see A Key as a Hierarchy). The node has pointers to each of the two choices of attributes, but those attributes themselves are not contained in this node. However, in this node is the attribute(s) for the choice that led to this couplet. For each couplet choice an attribute shows what locations (see Taxon Locations) taxa with paths through this choice are found; this also applies to terminal taxa. The level of this node is the same as that of the base node for this key. As mentioned in Key Types and Subkeys, some keys have subkeys, each of which is given a name with a number or letter or with an intermediate taxon rank; for subkey couplet nodes, this name is stored with the couplet node (other types of couplet nodes do not have a name). An intermediate rank can also be associated with a choice that points to another couplet rather than to a subkey. Besides pointing to another couplet node, a couplet node can point to two types of target nodes, which are described next.
Target - This is the destination taxon, which was arrived at by a unique sequence of choices in the key; that is, there is a single attribute set in the key that has this taxon as its target. This taxon will be at the next level compared to the level of the base node for this key. In this node is the final attribute choice that led to this target. Also in this node is the taxon name and number relative to the taxon in the key's base node; and there is a unique taxon id assigned by FNA. If a target node is the base node for a key, the node's children are row 1 and 2 in that key, so the node has the dual role of being the first couplet in the key. This is not the case if there are alternate keys at this level; instead the target node has the base node for each of these alternate keys as its children, and it has the dual role of offering a choice of the names of the alternate keys to the user. There is a rather special case where the target is an intermediate taxon that only has one child, so the intermediate name has to be stored with this type of target node as well as the taxon name of the child (this should not be called a subkey singleton because it is not a key and is not a singleton node, as used below). [To Do: Either (1) target nodes are one level above base & intermediate nodes only associated with couplets or (2) add additional nodes that show all immediate children of intermediate nodes, so can show hierarchy.]
Attribute Set - This node corresponds to one attribute set for a destination taxon that has multiple attribute sets (an attribute set is a sequence through the key leading to a target taxon; multiple-attribute-set targets were described in A Key as a Hierarchy). This node is similar to a target node, but since there are multiple paths to this destination taxon, there is also a special connector node that each of these attribute set nodes point to. Appended to the taxon name is an attribute set number in order to make the node name unique. There is no relative target number and taxon id since these are in the connector node. Note that there can also be multiple attribute sets leading to subkeys, so in this case instead of a taxon name, the name is an intermediate taxon name.
Multiple-Attribute-Set Connector - This node is the destination taxon node pointed to by each of the multiple-attribute-set nodes; this connector node points back to each of those nodes (so to follow a path to the base node requires knowing which of the attributes sets was selected). For an example of a connector node see Effect of Single and Multiple Parents on Next-Level Keys in A Key as a Hierarchy. Like a target node, this node has the taxon name, relative target number and taxon id. In addition, this node has the merge point for all of the attribute set paths; that is, all attribute sets have common attributes between the merge point node and the base node for the key; in some cases, the merge point and the base node are the same (there are no common attributes). If the attribute set nodes are for an intermediate taxon, this connector node will be for that intermediate taxon also. [Probably want to split this into Multiple-Attribute-Set Target Connector and Multiple-Attribute-Set Group Connector (or External and Internal Multiple-Attribute-Set Connectors) because in the table below Multiple-Attribute-Set Group Connector does not need Taxon Id and Target #.]
Alternate-Key Connector - This node is needed whenever there are alternate keys (in addition to the alternate key node, which is discussed next). In this case, a connector node is needed for each target in the alternate keys, so that as couplet choices are made that traverse down through any target, which of the alternate keys was used can be saved and when traversing back up through that target, the same alternate key is used in creating the attribute list for the terminal target. If there are multiple-attribute-set nodes in an alternate key, a multiple-attribute-set connector node is needed for each set, and each of those connector nodes is connected by an alternate-key connector node.
Alternate Key - An alternate key node connects a root or target node to all available alternate keys for the family or target. Currently only eleven alternate key nodes are needed, which are listed in Key Types and Subkeys. Each alternate key node is the base nodes for the corresponding key. Each is named so that the parent can offer a choice to the user which of the alternate keys they want to use.
Singleton - This node represents a singleton node, which results when a parent node only has one child that is at the next level, so no key is involved. A root node, a target node or a connector node can be a singleton parent node. A singleton node contains the taxon name, relative target number and taxon id.
Missing-Key Target - If a taxon has children, but the key to distinguish the children from each other is missing, then they can still be listed under that taxon. For those that are not terminal, then lower keys may exist, so that a hierarchy can be shown with a gap for the missing key. A missing-key target is also used when a taxon is keyed at a level that is not normal for that taxon; e.g., there is no need for a subsp. key for Piperia elegans because all these subspecies are in the key for species at the Piperia level, but a missing-key target node is needed for Piperia elegans to show the hierarchy of names from Piperia to elegans and then to its two subspecies (see Naming Hierarchy) [so a better name for this node type may be "Target Not-In-A-Key"].
Segregate - A target in a key that is where a taxon used to exist before FNA moved it to a new location in the hierarchy. A link gives that new location, so that keying of a specimen can be resumed from there.

[In the table below "Multiple-Path" is used instead of "Multiple-Attribute-Set" - which is better?]
The following summarized the data contained in each type of node:

	Node Type	Name	Rank	Locations	Row Label	Characters	Taxon Id	Taxon Id Type	Target #	Merge Point	Intermediate Name
Root	✔	✔	✔				✔		✔
Couplet	✔	✔	✔	✔	✔	✔					✔
Target	✔	✔	✔	✔	✔	✔	✔	✔	✔		✔
Multiple-Path Target	✔	✔	✔	✔	✔	✔
Multiple-Path-Target Connector	✔	✔	✔				✔	✔	✔	✔
Multiple-Path-Group Connector	✔	✔	✔							✔	✔
Alternate-Key Connector	✔	✔	✔				✔		✔
Alternate Key	✔	✔	✔
Singleton	✔	✔	✔				✔		✔
Missing-Key Target	✔	✔	✔				✔		✔
Segregate	✔	✔	✔

See Node Groups & Relationships for a description of how nodes are combined to model the FNA keys.

[Add 11/4/2017 diagrams of the major node types: couplet, target, AS, connector.]

Dual Role of Nodes - The node type reflects the role of a node relative to its parent or to other nodes in a key. Root, target, connector and singleton nodes also have another role relative to nodes at the next level; these roles are either

Couplet
Singleton Parent
Key Base
Offer alternate key choices
Terminal

These are not separate nodes, but nodes at the previous level acting in their other role. In A Key as a Hierarchy, an example shows a row in a key can have dual roles in that key and the key at the next higher level. In Node Groups & Relationships the more general term Base Node is used for Singleton Parent or Key Base.

Tag List Creation - The following are general steps in creation of a tag list and tagListIndex that is used in a class entry:

Create a new Tag object.
Add that object to the tags array list, keeping track of the tag index.
Create a new TagList object, using the tag index. keeping track of the tag list index.
Get a reference to that TagList object.
For each addition tag needed, create a new Tag object.
Add that object to the tags array list, keeping track of the tag index.
Add that tag index to the TagList object.
Insert the tag list index obtained in step 3. in the class entry.

Key Structure

Tuesday, September 17, 2013

This section shows conceptually the structure of the FNA keys; the structure of the database used to store those keys is explained later in Node Types & Relationships. In particular, how all the keys at the different ranks fit together to create one key for the whole family.

Targets with a Single Parent - A couplet is a parent-child relationship with two children:

As mentioned in Couplets, Targets and Rows, each couplet choice leads to either another couplet or to a target.

If all targets in a key have a single parent, then there is a unique path between the root of the key (a family, genus, species or subspecies) to each target, usually one rank down from the root. That is, from each target one can unambiguously traverse back up to the root of the hierarchy.

Targets with Multiple Parents - However, as seen in the Characters Sets and Paths section, some targets in keys can have more than one characters set, which results in multiple parents for those targets. The diagram in that section is part of the Boechera key, which can be redrawn to show B. repanda with two parent rows (35 and 73):

That is, depending on which parent is chosen, a different path and characters set results.

Connection of a Key to a Key at the Next Higher Rank - How keys are connected to the key at the next rank is different for targets with single and multiple parents. An example of each case is shown below.

Silene is a genus target in the Caryophyllaceae key with a single characters set:

The Silene key has the following connections to the Caryophyllaceae key:

The row numbers in this diagram are relative to the key for the rank; in the Key Types and Subkeys section, renumbering of rows in concatenated keys is discussed, but row numbers in the above diagram are before that renumbering. Comparing the Silene key to the Drymaria key using row numbers in the Couplets, Targets and Rows section, row 37 is the root of the Silene key, which is called row 0 in the Drymeria key.

What is important is that the parent for the initial couplet in the Seline key is in the Caryophyllaceae key, so the logic to set up this parent-child relationship is different than if they were in the same key. Row 37 is the Silene target, but it also has a next couplet that is the first couplet in the Silene key. See more on the dual role of nodes in the Node Structure and Types section.

S. scoleri is a target in the Silene key with two characters sets:

By creating a special node that acts as a target for rows 118 and 143 in the Silene key and as a couplet for rows 1 and 2 in the S. scolari key, all other elements in the Silene and S. scolari keys can be treated as normal parent-child relationships.:

The Node Structure and Types section discusses how key structures, including these special multiple-characters-set targets, are represented as data structures. In particular, a childList is used for the choices of a couplet and a parentList is used for the parent, or, if there are multiple-characters-sets, for the parents.

Subkeys and Concatenated Keys

Thursday, September 5, 2013

The number of keys in FNA is greater than the 2145 keyed taxa from the table in Taxon List & Counts by Type. One reason is five taxa have alternate keys:

Asteraceae: Synoptic and Artificial Keys
Brassicaceae: Flower-Based and Fruit-Based Keys
Lauraceae: Flower-Based and Fruit-Based Keys
Portulacaceae Portulaca: Flower-Based and Fruit-Based Keys
Salicaceae Populus: Flower-Based, Fruit-Based and Leaf-Based Keys

Because the alternate keys add new attributes and new paths through the keys, each key must be handled separately (see Node Structure and Types for the mechanism to choose which alternate key to use and see Node Groups and Relationships for the effect on node numbering).

Adding the six alternate keys, there are 2151 master keys. Of these, 51 have subkeys, so 2100 don't. Those keyed taxa with subkeys tend to have a lot of subkeys: 427.

[9/20/17 - The great majority of subkeys subdivide a taxon. But the subkey may be for a different taxon at the same level. This type of subkey has to be treated differently depending on whether a multiple-attribute-set connector is involved (see Effect of Single and Multiple Parents on Next-Rank Keys in A Key as a Hierarchy). There are examples of each in the Flora of the Pacific Northwest. A subkey without a multiple-attribute-set connector is Asteraceae Askellia , which is a subkey of Asteraceae Crepis (see 10/8/17 notes and diagram on Effect of Artificial Keys page). A subkey with a multiple-attribute-set connector is Brassicaceae Arabis, which is a subkey of Brassicaceae Boechera (see 8/13/17 notes).]

Summarizing FNA keys by type:

Keys with no subkeys	2100
Keys with subkeys	51
Subkeys	427
Total keys	2578

My list of taxa with multiple keys, with their alternate keys, master key and subkeys, is here. You can see from this list that the names of the subkeys are either an intermediate rank that has its own subkey or a number or letter.

Some taxon pages have a List of Keys, which contains the master key and subkeys. More commonly, keys for intermediate ranks are included in the list of lower taxa. Some of the intermediate ranks listed by this method are singleton parents (they have only one child), so these are not subkeys.

Concatenated Keys - Numbering of couplets and rows in keys and subkeys start with 1; numbering of couplets and rows in keys was discussed in Couplets, Targets and Rows. But to create a single key for a key that has subkeys, the master key and all subkeys must be concatenated into one key. The numbering of couplets and rows in the first subkey must be renumbered to continue where numbering left off in the master key, and likewise as each subkey is added.

As an example, look at the couplet and row numbers shown in the partial paths diagram for Brassicaceae Boechera shown in Couplets, Targets and Rows. This was created from the master key and from the Group 1 subkey; the couplets and rows for the subkey were renumbered to continue on from the numbering in master key.

If keys followed a strict hierarchy, from family to genus. to species. and to subspecies, then numbering in keys could start with 1 for each family. But since that is not the case, keys must be concatenated and renumbered again, in order to form one key starting with 1. In the resulting key, targets no longer need to follow the traditional hierarchy.

Printed keys do not need to used concatenated keys.

Flora of North America Key Structure