Up to FastTree
Comparing tree topologies is an essential and time-consuming step in phylogenetic analysis. CompareTree.pl and CompareToBootstrap.pl are up to 100 times faster than the standard tools for comparing large trees. For example, to compute the global bootstrap for 100 replicates on a tree of 52,000 sequences, phylip's consense took 118 hours and 9.4 GB of RAM, while CompareToBootstrap.pl took just 80 minutes and 720 MB of RAM. To compare two trees of 52,000 sequences took CompareTree.pl just 21 seconds, while phylip's treedist took 3 minutes to compare two trees of just 5,000 sequences.
The key trick is to hash the splits of a tree (the two lists of nodes on either side of an internal edge) in just O(N) time by setting the hash value for an internal node to the sum of its childrens' hash values. We rely on the tree itself to store the list of descendents, so that the hash requires only O(N) space. Thus, CompareTree.pl and CompareToBootstrap.pl require just O(N) space and have a best-case running time of just O(N). In contrast, most tools require at least O(N2) time and space to compare two trees.
CompareTree.pl compares the splits in one tree to those in another. In particular, it reports the fraction of splits in the 1st tree that are also found in the second tree. This is related to the "symmetric" or Robinson-Foulds distance by fraction = 1 - d/(2*(Nleaves-3)). CompareTree also reports the splits that line up and compares their branch lengths and support values.
CompareToBootstrap.pl counts how often the splits in a main tree are found in the resampled trees. The bootstrap value for an internal node is the fraction of times that this split (that is, the leaves beneath this node versus all other leaves) is maintained in the resampled trees. CompareToBoostrap.pl outputs a new tree with the same topology and branch lengths as the original tree, but with these new bootstrap values. (Note: this is rather different from what consense does -- consense computes a new majority-consensus topology and returns the number of trees that support each split as the branch length for that internal node.)
The tree-comparison tools were developed by Morgan N. Price in Adam Arkin's group at Lawrence Berkeley National Lab. If you have questions, please email firstname.lastname@example.org.