Help Login Create account

Data released on August 28, 2017

Supporting data for "De Novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads"

Korlach, J; Gedman, G; Kingan, S, B; Chin, C, S; Howard, J, T; Audet, J, N; Cantin, L; Jarvis, E, D (2017): Supporting data for "De Novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads" GigaScience Database. http://dx.doi.org/10.5524/100311 RIS BibTeX Text

Reference quality genomes provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid assembly for potential improvements to the Sanger-based intermediate-read zebra finch reference and Illumina-based short-read Anna’s hummingbird reference, two vocal learning avian species widely studied in neuroscience and genomics. With DNA of the same individuals used to generate the reference genomes, we generated diploid assemblies with the FALCON-Unzip assembler, resulting in contigs with no gaps in the megabase range, representing 150-fold and 200-fold improvements over the current zebra finch and hummingbird references, respectively. These long-read and phased assemblies corrected and resolved what we discovered to be numerous misassemblies in the references, including missing sequences in gaps, erroneous sequences flanking gaps, base call errors in difficult to sequence regions, complex repeat structure errors, and allelic differences between the two haplotypes. These improvements were validated by single long genome and transcriptome reads, and resulted for the first time in completely resolved protein-coding genes widely studied in neuroscience and specialized in vocal learning species. These findings demonstrate the impact of long reads, sequencing of previously difficult-to-sequence regions, and phasing of haplotypes on generating high quality assemblies necessary for understanding gene structure, function, and evolution.

Contact Submitter

Related manuscripts:

doi:10.1093/gigascience/gix085

Accessions (data included in GigaDB):

BioProject: PRJNA368994
BioProject: PRJNA289277

Projects:


Keywords:

De novo genome assembly long reads SMRT Sequencing brain language 

Genomic, Transcriptomic

/images/uploads/image_upload/Images_439.png

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
ChIP-Seq/RA_359729Zebrafinch Description:Chromatin extracted from RA of Zebra f...
Alternative names:S-JL093- left leg band #
Analyte type:Chromatin
...
+
SRS195433259729Zebrafinch Description:DNA extracted from /muscle of adult/ m...
Alternative names:black17 - leg band #
Analyte type:DNA
...
+
RNA-Seq/RA_rep159729Zebrafinch Description:RNA extracted from RA of Zebra Finch B...
Alternative names:E-JL181 and V-JL246
Analyte type:RNA
...
+
RNA-Seq/RA_rep259729Zebrafinch Description:RNA extracted from RA of Zebra Finch B...
Alternative names:R-JL134 and R-JL310
Analyte type:RNA
...
+
RNA-Seq/RA_rep359729Zebrafinch Description:RNA extracted from RA of Zebra Finch B...
Alternative names:V-JL97 and P-JL1290
Analyte type:RNA
...
+
RNA-Seq/RA_rep459729Zebrafinch Description:RNA extracted from RA of Zebra Finch B...
Alternative names:P-JL1246 and G-JL210
Analyte type:RNA
...
+
RNA-Seq/RA_rep559729Zebrafinch Description:RNA extracted from RA of Zebra Finch B...
Alternative names:no_band
Analyte type:RNA
...
+
ChIP-Seq/RA_159729Zebrafinch Description:Chromatin extracted from RA of Zebra f...
Alternative names:S-JL091- left leg band #
Analyte type:Chromatin
...
+
ChIP-Seq/RA_259729Zebrafinch Description:Chromatin extracted from RA of Zebra f...
Alternative names:S-JL092- left leg band #
Analyte type:Chromatin
...
+
SAMN022652529244Annas hummingbirdAnnas hummingbirdCalypte anna Description:DNA extracted from blood and liver of ...
Alternative names:BGI_N300
Analyte type:DNA
...
+
Displaying 1-10 of 10 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
Tabular DataTAR512.21 KB2017-08-23
Sequence assemblyFASTA591.47 MB2017-08-23
Tabular DataTAR14.02 KB2017-08-23
transcriptome sequenceUNKNOWN1.44 GB2017-08-23
transcriptome sequenceUNKNOWN320.71 MB2017-08-23
transcriptome sequenceUNKNOWN1.79 GB2017-08-23
transcriptome sequenceUNKNOWN2.45 GB2017-08-23
transcriptome sequenceUNKNOWN628.67 MB2017-08-23
transcriptome sequenceUNKNOWN1.53 GB2017-08-23
transcriptome sequenceUNKNOWN104.08 MB2017-08-23
Displaying 1-10 of 13 File(s).

History:

+

Other datasets you might like: