<body>

Taxon Locations

Friday, August 30, 2013
I've posted two spreadsheets that show state or province postal codes for all FNA terminal taxa.  Due to a Google Drive spreadsheet size limitation, these data were split:  Alberta through North Dakota and Nebraska through Yukon Territory.  Taxon counts by location are in the last row.

Terminal taxa  in my taxon list (see Taxon List & Counts by Type) have locations shown on their FNA taxon page.  The 11,200 terminal taxa that have their own taxon page have the following location information:

Location directly shown on taxon page 11,148
Location indirectly available 50
Location information not available 2

Indirect location information is most commonly found from a distribution map.  One of the taxa (Iris germanica) with no location information says it "may persist after cultivation"; the other (Corispermum nitidum) says it was "supposedly introduced but doubtful".  Location information for the few taxon without their own page has not been added yet.

The 65 North American locations used in the FNA are the 49 continental states and District of Columbia in the USA, the 13 provinces and territories of Canada, Greenland which is an autonomous country within Denmark, and the islands of St. Pierre and Miquelon which is a self-governing territory of France.

In the future, if a user of the FNA keys selects the location(s) that they are interested in, then the keys can be customized.  Then the keys would only show those couplets and terminal taxa relevant to their location(s), which greatly simplifies keying a specimen. Also the keys can be simplified because there are location-specific couplets in the keys; these couplets may eliminate all except one attribute set leading to the terminal taxa.

See Localized Keys and Predetermined Characteristics. [Add reference location attribute in table on Node Structure and Types post.]

Couplets, Targets and Rows

Friday, August 23, 2013
Before using FNA keys to identify a specimen, the family must be known. If the family is not known, use, for example, the angiosperm and gymnosperm key to families, which are multi-access keys, created as part of the Pathkey project.

In each FNA key, couplets are numbered starting with 1.  A couplet gives a choice of two alternatives, each of which describe one or more characteristics (called "characters") that are to be matched with the specimen that is being identified. The first choice in the key is labeled with the couplet number and the second is labeled "+". Following each choice, where to go next in the key is indicated by a couplet number.  However instead of a next couplet, one or both choices may lead to a destination target, which is a taxon with rank normally one lower than the rank of the key.

A diagram showing the hierarchy of couplet numbers is a simple way to show the overall structure of a key.  For example, the following diagram shows the structure of the Caryophyllaceae Drymaria key to species using couplet numbers and destination targets:
As a choice is made at each couplet, the characters set (see Characters Sets and Paths) for the target grows.

When creating a target's characters set, it is easier to use the key's row numbers as an index into the key, instead of using couplet numbers. The keys are laid out such that the row number can be obtained from the couplet number:
Couplet number Row number for current couplet
Using first choice Using second choice
n 2n - 1 2n
In other words, the row number of the second choice is double the couplet number, and the row number of the first choice is one less than this. This assumes that each couplet lead is only on one row; that is all lines for characters of a lead are put into one row. The algorithms for traversing the keys depend on this couplet-to-row relationship; see Key Types and Subkeys and Relation of Key Structure to Taxonomic Hierarchy for how special cases are handled to maintain this relationship.

This is another version of the diagram for the Caryophyllaceae Drymaria key using row numbers instead of couplet numbers:
A row number 0 (representing the root of the Drymaria key) has been added as a starting point.

When row numbers are needed, a third version that has both couplet and row numbers is easier to use. For an example of such a diagram, see the one in Characters Sets and Paths for Brassicaceae Boechera. Notice in that diagram when the row numbers are twice couplet numbers. Also note that the row number of the first choice in a couplet is put to the left of the couplet, and the second choice is put to the right.

Couplet diagrams are easier to create and understand, but rows are used to help create the characters-set list for a target, as described in Characters Sets and Paths and are needed to understand how keys at different ranks are connected, as described in Key Structure.

As discussed in Key Types and Subkeys, when keys for the multiple genera of a family are concatenated to form the key to species for the family, rows are renumbered, so that the row number applies to the key for the family. That is, for the family key, couplet numbers are less useful because they only apply to the key for a given rank, but row numbers can be used to uniquely identify any couplet or target in the key at any rank in the family. Also row numbers are used in constructing nodes, as discussed in Node Structure and Types.

Characters Sets and Paths

Tuesday, August 20, 2013
As one makes binary choices of characters in a key, one creates a set of characters for the destination target and a path through the key. In listing the Characters Set (CS), it is more useful to reverse the order, so those characters chosen last are listed first because those are are most specific to the target.

For example, the CS for Boechera burkii is:
  • Racemes usually unbranched; cauline leaves 18-28; ovules 64-80 per ovary; seeds 1.2-1.4 mm wide
  • Biennials, without caudices; stems (2-)3-10 dm; cauline leaves 18-80
  • Basal leaf blade surfaces glabrous or with simple trichomes only
  • Cauline leaf blades not auriculate
However, the CS can be more complex. One reason is that there can be more than one CS for  a target when there are diverse characters leading to it. In the FNA keys, targets whose name is appended with "(in part)" in the key have more than one CS. Keys in this papers instead append CSn to the target, where n=1,2,…; that is, there are multiple characters sets, which are numbered.

Using Boechera repanda as an example, CS1 is:
  • Basal leaf blades 10-25(-50) mm wide; petals 3.5-6 mm.
  • Fruit valves pubescent
And CS2 is:
  • Basal leaf blades 7-25(-50) mm wide, margins usually repand to dentate, rarely entire
  • Fruits 2-5.5 mm wide, divaricate-ascending to erect, ± appressed to rachises
  • Stems proximally with simple and/or branched trichomes usually less than 0.5 mm; ovules 8-52 per ovary
  • Fruits not secund
  • Fruits erect, ascending, or horizontal
  • Fruit valves glabrous
If a target has multiple characters sets, then the targets may have the same general attributes, and only the more specific characters differ. For Boechera repanda the characters that are in common for CS1 and CS2 are:
  • Plants usually sparsely to densely pubescent proximally (sometimes throughout)
  • Styles 0.05-2 mm
  • Basal leaf blade surfaces with at least some branched trichomes
  • Cauline leaf blades not auriculate
From this point on as you read this paper,  make sure you can reproduce the diagrams given the source FNA keys. This will make sure you understand each concept as it is presented.

The following diagram for the Brassicaceae Boechera key shows how these characters-set lists relate to the type of diagram discussed in Couplets, Targets and Rows.
This diagram is easier to understand because it shows both couplet and row numbers and because it shows only the paths of interest through the key. From this, the list of characters for B. repanda CS1 can be found in rows 35 and 33, those for CS2 are in rows 73, 72, 70, 66, 48 and 34, and those in common for both CSs are in rows 26, 22, 10 and 1.

The paths for CS1 and CS2 merge at couplet 17; the choices for couplet 17 lead to either CS1 or CS2 of B. repanda. If there are multiple characters sets for a taxon, there will be one or more couplets that will be merge points where the taxon has the characters of both choices of the merge point couplet.

There are two other reasons that the characters set for the path through a key can be more complex. As discussed in Key Types and Subkeys, some taxa have keys with subkeys (as does Boechera); each of these subkeys, which may correspond to intermediate-rank targets, add a characters set or multiple characters sets.  Also, as shown in Multiple-Characters-Set Groups, a couplet can have multiple characters sets even if there is no subkey, although this is rare.

Using the above methods, it is possible to list every characters set for a family and to follow every possible path through the key.  For large families, with keys at each rank and possibly subkeys and alternate keys, that list can be long. However, as shown in Localized Keys and Predetermined Characteristics, following  every path allows one to create localized keys from the FNA key.

Overview

Monday, August 12, 2013
The Flora of North America North of Mexico is being created by over 800 authors in an online version and in 30 printed volumes; this is usually called just Flora of North America (FNA). The FNA contains dichotomous identification keys, which allow biologists and serious amateurs to identify the known plants in North America.

This paper analyzes the structure of the FNA keys, pointing out patterns that help visualize and create diagrams of the keys, especially in cases where the the key structure is complex. The patterns documented and the diagramming methods generally apply to all dichotomous keys.

While analyzing the online FNA keys, consistency checks were created to assure that there are no structural errors in the FNA keys. The number of errors found were generally small, but the number of keys is large, so elimination of these errors is essential for algorithmic processing of the keys. The result was a database containing the FNA keys with the structural errors corrected. This database does not change the intended content of the keys; that is, the wording of couplets and resulting taxon names are changed very little, but there were many changes because of  missing or incorrect connections between the couplets and the taxa.  The online FNA was created before the printed version, and many of the errors in the online version were corrected in the printed version.

Taxa in the FNA keys don't occur uniformly across the 65 locations (see Taxon Locations), so keying could be simplified if there were a key for each location (or set of locations). An advantage of using the FNA key database is that the FNA keys can be enhanced with localization information. How this is done and how localized keys can be created is described in the section on Localized Keys.

The 2nd Edition of the Flora of the Pacific Northwest (FPNW) is to be published in 2018. Keys in FPNW have structures similar to FNA, but also have some have additional structural features; these are mostly described in the section Effect of Artificial Keys.

Data created by my analysis are too large to be contained in this blog, so this blog has links to Google Docs spreadsheets. A Google account or a gmail account is required to access these spreadsheets; if you don't have an account, you can go here to create one.


Contents

      Overview
      Couplets, Targets and Rows
      Characters Sets and Paths
      Multiple-Characters-Set Groups
      Subkeys and Concatenated Keys
      Key Structure
      Effect of Artificial Keys
      Taxon List and Counts by Type
      Node Structure and Types
      Node Groups and Relationships
      Taxon Locations
      Localized Keys and Predetermined Characteristics

The first four sections below present concepts needed to describe the basic Key Structure. The Effect of Artificial Keys section extends that. Most of the keys follow Taxonomic Rank, where classification of an unknown species proceeds sequentially from family to genus, then to species and possibly subspecies; that is, they mostly have a synoptic key structure. However, some of the more complex keys mix in an artificial key structure, which makes identification more convenient and reliable (this is sometimes called a diagnostic key structure). The Effect of Artificial Keys section has several examples of how the taxonomic hierarchy can be incorporated into visualizing the key structure of these keys.

Taxon List & Counts by Type

Sunday, August 11, 2013
The current list of taxa in the online version of the FNA keys is here.  As new volumes of the FNA are published and put online, this list will be updated. For most taxa, a unique taxon id is shown in the first column (a few exceptions are numerated below under Terminal Taxa). The types shown in the third column of the list is explained below.

Given that the plant family has already been identified, the FNA keys allow more and more specific identification at each taxon rank, ultimately leading to a terminal taxon.  The number of taxa at the different ranks are as follows:

Rank Type within Rank Number of Taxa
family family 234
genus genus 1785
genus subclasses species 10,939
hybrid 82
species subclasses subsp. 903
var. 2105
var.× or nothovar 3*
subsp. subclasses var. 15
Total in FNA keys 16,066
* See:
Asteraceae Petasites frigidus var.×vitifolius
Rosaceae Crataegus ×sicca nothovar. sicca
Rosaceae Crataegus ×sicca nothovar. glabrifolia 

Non-Terminal Taxa - Taxa that are non-terminal (families, genera and some species) are of three types (see the third column of my taxon list):

keyed A key is used to differentiate subclasses 2145
not keyed singleton parent There is only one subclass 1080
species for naming* only Species not in a key, but subspecies is 4
Total Non-Terminal Taxa 3229
* See section on Naming Hierarchy

Keyed taxa will be discussed further in Key Types and Subkeys.

Terminal Taxa - Terminal taxa are also indicated in the third column of my taxon list.  In a few cases, destination targets in the FNA keys are not listed in the site's taxa list; these targets are distinguished from the other terminal taxa.

terminating A target that has a unique taxon id 12,791
terminatingOnAssocTaxon Rather than have its own taxon id, the target is described on the page of an associated taxon 41
terminatingOnBadTaxon A target with no taxon page or with a bad taxon id 5
Total Terminal Taxa 12,837

[In order to clarify taxa that are classified as terminatingOnAssocTaxon or terminatingOnBadTaxon, I need to indicate whether or not these taxa are synonyms for another taxon that is also a target.]

The Node Structure and Types section shows how the taxon id type (one of the three types shown in this table) is stored in the node for each taxon.

The FNA keys can be used to differentiate the following types of destination targets:

hybrid 81
species 9737
subsp. 896
var. 2123
Total Terminal Taxa 12,837

Checklist - A different view of all FNA taxa is in this Checklist. This was created slightly later than the above tables, so there are 16,822 taxa and of these 13,410 are terminal. Not included are taxa terminating on associated or bad taxa. But it also lists 26,985 synonyms.

The taxon hierarchy is shown by indentation: starting with family, each lower level is indented by two characters. After each taxon, any synonyms are shown in square brackets, with multiple synonyms separated by semicolon. Because of the possible synonyms, the line for any taxon can become too long for the page width of 96 characters, so it is continued on the next line, indented by one more character from the current level.