Up to FastTree

## Fast Tree-Comparison Tools

Download perl scripts and README file:
Comparing tree topologies is an essential and time-consuming step
in phylogenetic analysis.
**CompareTree.pl** and **CompareToBootstrap.pl**
are up to 100 times faster than the standard tools for comparing large trees.
For example, to compute the global bootstrap
for 100 replicates on a tree of 52,000 sequences, phylip's
consense
took 118 hours and 9.4 GB
of RAM, while CompareToBootstrap.pl took just 80 minutes and 720 MB of RAM.
To compare two trees of
52,000 sequences took CompareTree.pl just 21 seconds, while phylip's
treedist
took 3 minutes to compare two trees of just 5,000 sequences.

The key trick is to hash the splits of a tree (the two lists of
nodes on either side of an internal edge) in just O(N) time by setting
the hash value for an internal node to the sum of its childrens' hash
values. We rely on the tree itself to store the list of
descendents, so that the hash requires only O(N) space. Thus,
CompareTree.pl and CompareToBootstrap.pl require just O(N) space and
have a best-case running time of just O(N). In contrast, most tools
require at least O(N^{2}) time and space to compare two
trees.

**CompareTree.pl** compares the splits in one tree to those in
another. In particular, it reports the fraction of splits in the 1st
tree that are also found in the second tree. This is related to the
"symmetric" or Robinson-Foulds distance by *fraction = 1 -
d/(2*(N*_{leaves}-3)). CompareTree also reports the splits that line up and
compares their branch lengths and support values.

**CompareToBootstrap.pl** counts how often the splits in a main tree are found in the resampled trees.
The bootstrap value for an internal node is the fraction of times that this split
(that is, the leaves beneath this node versus all other leaves) is
maintained in the resampled trees. CompareToBoostrap.pl outputs a new tree with the same topology and branch lengths as the original tree, but with these new bootstrap values. (Note: this is rather different from what
consense
does -- consense computes a new majority-consensus topology and returns the number of trees that support each split as the branch length for that internal node.)

### Contact Us

The tree-comparison tools were developed by Morgan N. Price in Adam Arkin's group at Lawrence Berkeley National Lab. If you have questions, please email fasttree@microbesonline.org.