Basic Local Alignment Search Tool: NCBI


NCBI has provided BLAST sequence analysis services for over a decade. For many users, the first question they often face is "Which BLAST program should I use?" In order to help users arrive at an answer to this question, we created this "BLAST Program Selection Guide."
This document first introduces the BLAST databases available from NCBI (in Section 2). The actual guide (Section 3) divides BLAST searches into several categories according to the nature and size of the input query and the primary goal of the search. Starting from the query sequence column on the left and cross-referencing to the right, a user will arrive at the specific BLAST program(s) best suited for that search.
This document is also available in PDF (163,516 bytes).
BLAST Database content
A BLAST search has four components: query, database, program, and search purpose/goal. To discuss effective BLAST program selection, we first need to know what databases are available and what sequences these databases contain. In this section, we will first take a look at the common BLAST databases. According to their content, they are grouped into nucleotide and protein databases.
MEGABLAST is the tool of choice to identify a nucleotide sequence
The best way to identify an unknown sequence is to see if that sequence already exists in a public database. If the database sequence is a well-characterized sequence, then one will have access to a wealth of biological information. MEGABLAST, discontiguous-megablast, and blastn all can be used to accomplish this goal. However, MEGABLAST is specifically designed to efficiently find long alignments between very similar sequences and thus is the best tool to use to find the identical match to your query sequence. In addition to the expect value significance cut-off, MEGABLAST also provides an adjustable percent identity cut-off for the alignment, which provides cut-off in addition to the significance cut-off threshold set by Expect value.

Web MEGABLAST and discontiguous megablast pages can also accept batch queries, the only web BLAST pages with this capability. Please refer to the "Batch Search" section for details.

Discontinuous MEGABLAST is better at finding nucleotide sequences similar, but not identical, to yours nucleotide query 
The BLAST nucleotide algorithm finds similar sequences by breaking the query into short subsequences called words. The program identifies the exact matches to the query words first (word hits). BLAST program then extends these word hits in multiple steps to generate the final gapped alignments. 

One of the important parameters governing the sensitivity of BLAST searches is the length of the initial words, or word size as it is called. The most important reason that blastn is more sensitive than MEGABLAST is that it uses a shorter default word size (11). Because of this, blastn is better than MEGABLAST at finding alignments to related nucleotide sequences from other organisms. The word size is adjustable in blastn and can be reduced from the default value to a minimum of 7 to increase search sensitivity. 

A more sensitive search can be achieved by using the newly introduced discontiguous megablast page. This page uses an algorithm with the same name, which is similar to that reported by Ma et.al. Rather than requiring exact word matches as seeds for alignment extension, discontiguous megablast uses non-contiguous word within a longer window of template. In coding mode, the third base wobbling is taken into consideration by focusing on finding matches at the first and second codon positions while ignoring the mismatches in the third position. Searching in discontiguous MEGABLAST using the same word size is more sensitive and efficient than standard blastn using the same word size. For this reason, it is now the recommended tool for this type of search. Alternative non-coding patterns can also be specified if desired. Additional details on discontiguous are available at: 
www.ncbi.nlm.nih.gov/Web/Newsltr/FallWinter02/blastlab.html 
Useful links:

0 comments: