Skip to content
Tags

,

Using the NCBI BLAST tools

August 19, 2010

NCBI BLAST is probably the most widely used tool by biologists and bioinformaticians alike. There is no way of getting around it, as its usage is taught in almost every biology-related university curriculum. In most cases, the webinterface at either NCBI (or to a lesser extend EBI) are used. But both organizations provide standalone tools that are able to perform the same searches as their web counterparts as well as more specific types of queries. Here, the syntax for basic command-line searches is reviewed.

Simple searches

The obvious advantage of the command-line tool compared to the one on the web is that your FASTA query file can contain multiple entries. Thus, you don’t have to submit a request for every sequence but do it all in one step.

[t]blast[npx] -query <query_file> -db <database> -evalue <e-value> -out <output_file> -outfmt <format>

Performing this search, all results are saved to your output file, which will mostly be in format 7 (tabular with headers explaining the individual columns), 10 (CSV which can be imported to a spreadsheet application) or 9 (XML output, parsed by the Bio* libraries). The standard NCBI databases can be downloaded from their website and are supplied without file extension. If you want to create a custom database from a FASTA file, here is how:

makeblastdb -in <fasta_file> -out <databse_name> -dbtype <nucl or prot> -parse_seqids

Creating a database is not a necessity however, since the BLAST tools are also able to search FASTA files directly (albeit this approach is much slower). In that case, the db parameter needs to be replaced with a subject file.

[t]blast[npx] -query <query_file> -subject <fasta_db> -evalue <e-value> -out <output_file> -outfmt <format>

Constructing and using PSSMs

The obvious application of position-specific scoring matrices is psi- (position-specific iterative) blast. Performing those, hits obtained within an interation are used to construct a matrix which is in turn used to score the subsequent iteration. The syntax of using psiblast is the following:

psiblast -query <query_file> -db <database> -evalue <e-value> -out <output_file> -outfmt <format> -num_iterations <n>

The only new parameter here is num_iterations, which specifies (surprise) the number of iterations performed. In each one of those, the scoring matrix is updated internally.

It can, however, also be used to construct and save a PSSM from multiple input sequences, which can then be reused. The parameters are out_pssm for saving and in_pssm for supplying.

psiblast -query <query_file> -db <database> -evalue <e-value> -out_pssm <pssm_file> > /dev/null
[t]blast[npx] -in_pssm <pssm_file> -db <database> -evalue <e-value> -out <output_file> -outfmt <format>

Note that we are not interested in the output in the first step (redirection of stdout to /dev/null) and the PSSM replaces the sequence file input in the second. If you do not want to create the scoring matrices yourself then NCBI does also offer a large set of them. They are, however, scaled with a factor of 100 (which means that you have to divide all the matrix values by that amount as the tools are not able to do that on their own).

Protein classification using rpsblast

Not only sequences can be used to construct a BLAST database, PSSMs can be as well. The method of searching them is called reverse psi-blast (rpsblast) and it is used for protein domain classification. The commands to construct a database and search to it are shown below.

formatrpsdb -n <db_name> -i <list_file> -o T -f 9.82 -S 100.0
rpsblast -query <query_file> -db <rps_database> -evalue <e-value> -out <output_file> -outfmt <format>

Therein, list_file is a textfile containing a list of names (and paths if needed) of the scoring matrix files. The -o parameter is used so each PSSM is represented by its name instead of just “unnamed”. When using PSSMs that are scaled by a factor of 100 an extension threshold (-f parmeter) of 9.82 is equal to 11 (which is the BLAST default) for unscaled matrices.

Advertisement

From → Blog

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.