This tarball (in release/FastBLAST-NR-May15-2008.tar.gz) includes the
the result of running the first stage of FastBLAST on the
non-redundant protein sequences in Genbank ("NR") as of May 15, 2008.

Warning: you'll need the tarball requires 27 GB of disk space and
unpacking it requires another 79 GB of disk space. Formatting the
BLAST database so that you can use FastBLAST on it will require a few
more GB. So you'll need about 110 GB of free disk space to install.

Once you have downloaded and expanded the tarball, you will need to
run formatdb before you can use topHomologs.pl to search for the top
homologs of a given gene:

	$FASTHMM_DIR/bin/formatdb -o T -p T -i nr.faa

The files in the tarball are:

nr.faa -- a fasta file of the sequences. It contains gi numbers only,
as numbers (without the "gi|" prefix). The nr.faa.map file has the
mapping from the identifier in the original NR database from Genbank
to the identifier in nr.faa.

fb.all.align -- The alignments of each family

fb.all.align.seek.db -- A BerkeleyDB index of the seek positions of
each family. The FastBLAST.pm Perl module in $FASTHMM_DIR/lib has
routines to use this index.

fb.all.domains.bygene -- The families for each gene. Most of the
families are from HMMs and are the same name as the HMM. COG families
have names like "gnl|CDD|30365". Ad-hoc families from FastBLAST have
names like fb.2345028.1.33

fb.all.domains.bygene.seek.db -- A BerkeleyDB index of the seek
positions of each gene. The FastBLAST.pm Perl module in
$FASTHMM_DIR/lib has routines to use this index.

fb.all.nseq -- The number of sequences topHomologs.pl uses this to
determine how many homologs to return.