Minimal tiling path BAC contig assembler

Description

BAC contig assembler is a computational tool for assembling contig of minimally overlapping BAC clones across genomic sequence of interest for mouse and human genomes. Resulting minimal tiling path BAC contig covers up to 99% of region with average 1.3x  genomic representation  and BAC density of 7-8 BAC clones per 1MB of sequence.

BAC contig assembly is based on WU-BLAST search of genomic sequence against TIGR BAC End Sequences Database . Availability of both BAC end sequences allow one to map clone onto genomic sequence with the help of BLAST algorithm. Using TIGR database results in retrieving redundant set of BAC clones mapped to the sequence (Fig.1A). BAC contig assembler algorithm uses this redundant set of clones to assemble minimal tiling path BAC contig which is comprised of non redundant subset of minimally overlapping clones (Fig.1B).

Fig1A.

Fig1B.

Fig.1 BAC contig assembly strategy.
A) Mapping all BAC clones from TIGR database onto genomic region of interest. B) Minimal tiling path BAC contig assembled with BAC_assembler.pl Perl program.
BAC clones are mapped with WU-BLAST and represented by thin horizontal bars joining short thick bars which are the position of BAC end sequences in the query sequence.
Taking into account the average BAC vector insert size of 150-200kb  we use only those BAC clones which end sequences are mapped within 300kb from each other allowing for short gaps of uncertain size within genomic assembly.
X axis represents nucleotides of genomic sequence.
Visualization is implemented with Genome Cryptographer software.


Organisms and Libraries


BAC contig assembler program allows for construction of minimal tiling path contigs across regions of mouse and human genome assemblies generated by UCSC Genome Project .

Following BAC libraries may be used for contig construction:


Technical Notes

Fig. 2. Assembly affect on contig construction. Thick bars represent BAC end sequences mapped onto query sequence. In some cases the length of homology is far more than the one of BAC end sequence which might be due to presence of repeated stretches of unique DNA  in local region of genomic assembly. These clones are not considered for contig assembly.
 
 



 

BAC contig assembler FAQ


Is it possible to assemble library specific contig?

BAC contig assembler program allows one to construct both library specific and mixed contigs. See Section 3 on submission page.

How can I fill in gaps in a contig?

Absence of sequence in a library causes gaps in a library specific contig. In this case gaps might be filled in with clones from other libraries. Click Yes in Section 7 on submission page to have redundant set of BACs mapped to the sequence of interest. Use  "All BAC clones mapped to sequence_name" and "List of all BAC clones mapped to sequence_name" files to insert necessary clones into minimal tiling path contig.

No contig has been assembled across genomic sequence?

There might be several reasons for absence of any clones spanning the region of interest: (i) high repeat content of sequence; (ii) there's no sequence available in the range of coordinates (you may check your sequence with the help of UCSC Genome browser. Note, that coordinates are freeze specific! ); (iii) low quality of assembly in the genomic region; (iv) the region is recalcitrant to cloning which leads to its absence in genomic libraries.