Evaluating Phylogenetic Analyses

Author and Date Size of matrix (taxa*characters) Percent miscoded old CI new CI
Gauthier, 1986 18*84 = 1512 32 0.86 0.70
Cracraft, 1986 15*73 = 1095 16 0.80 0.71
Chatterjee, 1991 8*30 = 240 46 0.97 0.86
Sanz and Bonaparte, 1992 4*19 = 76 30 1.00 1.00
Sanz and Buscalioni, 1992 8*14 = 112 20 0.70 0.74
Chiappe, 1993 8*10 = 80 5 0.83 0.83
--Sanz et al., 1995 9*10 = 90 4 0.83 0.83
Perez-Moreno et al., 1993 11*52 = 572 25 0.68 0.59
Chiappe and Calvo, 1994 8*73 = 584 15 0.86 0.84
Holtz, 1994 20*126 = 2520 36 0.51 0.42
--Charig and Milner, 1997 21*118 = 2478   0.51  
Perez-Moreno et al., 1994 7*22 = 154 16 0.81 0.75
Russell and Dong, 1994 11*59 = 649 25 0.55 0.53
Hou et al., 1996 8*32 = 256 43 0.85 0.96
Sereno et al., 1996 15*63 = 945 26 0.81 0.56
         
         
         
         
         
         
         

This portion of the website is devoted to past phylogenetic analyses of theropods. Each page is devoted to a particular analysis and any published modifications it has been through, noting the general importance of the paper with citation. The paper's phylogeny is then shown, using its own taxonomy.

Any taxonomic issues are described next. These may be cases where the author uses a different name for a taxon than I do (e.g. Elmisauridae in Gauthier, 1986 vs. Caenagnathidae), or where an author applies a name to a different group than I do (e.g. Perez-Moreno et al.'s 1993 Tetanurae). This is also the section that describes exactly which genera and/or species are used for each OTU (Operational Taxonomic Unit; basically each taxon which is coded for an analysis) of the author's analysis. For instance, Gauthier (1986) only included Allosaurus, Acrocanthosaurus and tyrannosaurids in his Carnosauria. Thus when I evaluate his codings for Carnosauria, I don't consider other taxa thought to be carnosaurs today which were presumably known to Gauthier in 1986 (e.g. Yangchuanosaurus, Carcharodontosaurus) or carnosaurs since described (e.g. Sinraptor, Neovenator). However, I do consider specimens which have since been assigned to new taxa if they were known to the author, such as the Conchoraptor holotype, which was used by Gauthier to code his Caenagnathidae even though it was assigned to Oviraptor sp. at the time. Misassigned taxa are generally not considered when evaluating codings, though I will indicate when their incorrect codings are due to a particular misassigned taxon. There are rare instances where a particular taxon is integral to an author's concept for an OTU though (like tyrannosaurids in Carnosauria for Gauthier), even though they are technically misassigned. When coding an OTU that consists of multiple species, the condition in the two basalmost species which can be coded is used, following current ideas on their internal phylogeny. Sometimes this phylogeny is different than what the author had in mind (e.g. most older analyses use Lesothosaurus as the basalmost relatively complete ornithischian, but I follow Butler in recognizing heterodontosaurids as being more basal), but the author generally does not state what their ideas are so this solution is used for simplicity. Often it is necessary to code an OTU as being polymorphic (having more than one state).

Coding issues take up a majority of each entry. These are divided by character and indicate when a character or coding is problematic. Characters can be flawed for several reasons. They might be correlated with another included character. Correlation means that if one character is coded a certain state, the other must also be coded a certain state. An obvious example are opposite characters (premaxilla toothed vs. premaxilla toothless), but more common are cases where two or more characters code for different ranges of the same variable. For instance, arm less than half of leg length vs. arm less than two-thirds of leg length. Any taxon with the first character must have the second character. The ideal solution would be to make a multistate character, but in order to leave each analysis as intact as possible, a different method is followed here. In the above example, any taxon scored as having an arm two-thirds of leg length or longer is simply not considered for the character 'arm less than half of leg length'. This means the long armed taxa are not coded twice for having long arms, which would unfairly weight that character. Another way characters can be problematic is if they code for more than one variable at a time, making them composite characters. Ideally, they would be broken up into multiple characters, but the less disruptive solution here is to code taxa which have one part of a character but not another part as being polymorphic. Characters which are completely correlated or repeated are deleted completely by recoding all taxa as unknown (in order to keep the character numbers the same as in the papers).

The most common coding issue is the miscoded character. Many times this is an error of the author's, but things are not always so simple and it isn't always the author's fault. Sometimes codings are done from inaccurate literature or inaccurately restored or interpreted fossils. Ideally, studying fossils firsthand could solve this, but there are often multiple possible interpretations of fossils, due to damage, disarticulation, deformation, age and other factors. Other miscodings are merely due to a lack of material at the time of publication, which is not the author's fault at all. While it is not exactly fair to present these as miscodings, the alternative of only coding that which was knowable at the time of publication is far too complicated. It depends not just on when the descriptive literature was published, but also when specimens were found (not always reported), when the specimens were prepared (almost never reported), which specimens are codable for each character (often unknowable without direct examination yourself), and other variables. As an objective partial measure of this effect, taxon names are bolded when a character correction was from one definite state to another (0>1, 1>0, etc.), or from a definite state to an unknown state (0>?, 1>?, etc.), but not bolded when a previously unknown state was merely updated with new information (?>0, ?>1, etc.). References are given to support each correction. It should be noted though that I myself am not immune to any of these sources of error, so some of my corrections are no doubt wrong, and some codings I don't correct will certainly be shown to need correction in the future.

The General Analysis Conclusions goes over the negative and positive points of the analysis, notes the percentage of miscodings, and shows the resulting tree when all codings are corrected. This tree uses my taxonomy.

The Phylogenetic Conclusions section includes a table that compares the length of trees when different arrangements are tested. Positive numbers mean that many more steps needs to be added to make that arrangement, while negative numbers mean that the tree already has that arrangement but that many steps have to be added to change it. The shorter the tree, the more likely that arrangement is correct. However, small differences in tree length do not mean much, since added characters and/or taxa can negate them. The various arrangements are tested for both the original matrix, and my recoded version. I then describe in qualitative terms just how well or poorly supported various arrangements are.

Experiments With Controversial Taxa asks what would happen if the included taxa were different. Maybe there's a taxon the author included that further study has shown doesn't belong to the studied group, like Protoavis in Chatterjee's analyses. Maybe the author didn't include an important taxon, or an interesting taxon was discovered after publication. What if Gauthier (1986) had included then-known segnosaurs in his theropod analysis, or what if Holtz (1994) had known about Sinovenator, for instance? This can also test criticisms of the analysis, like when Charig and Milner (1990) claimed Gauthier's study couldn't handle Baryonyx.