Help Login Create account

Data released on August 16, 2017

Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts"

Cheng, B; Furtado, A; Henry, R (2017): Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts" GigaScience Database. RIS BibTeX Text

Polyploidization contributes to the complexity of gene expression resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis.
An isoform-level tetraploid coffee bean reference transcriptome with 95,995 distinct transcripts (average 3,236 bp) was obtained. A total of 88,715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34,719 high quality annotations. Further BLASTn to NCBI non-redundant nucleotide sequences, C. canephora coding sequences with UTR, C.arabica ESTs and Rfam resulted in 1,213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5'UTRs, facilitating the identification of upstream ORFs (uORFs). The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10kb) were poorly annotated.

Contact Submitter

Read the peer-reviewed publication(s):

Cheng, B., Furtado, A., & Henry, R. J. (2017). Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience, 6(11), 1–13. doi:10.1093/gigascience/gix086

Additional information:


Accessions (data included in GigaDB):

BioProject: PRJEB19262


coffee transcriptome full-length cDNA long sequences isoform polyploid UTR 



  • Funding body - Australian Research Council
  • Award ID - LP130100376
  • Comment - Understanding Coffee Quality
  • Awardee - R Henry
  • Funding body - Chinese Scholarship Council
  • Location - China
  • Comment - Study abroad
  • Awardee - Bing Cheng

Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Coffea arabica var K713443arabica coffeecoffeeCoffea arabica Description:A long read transcriptome of developin...
Infra specific name:variety:var K7
Analyte type:RNA
Displaying 1-1 of 1 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
Data Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
Coding SequenceFASTA2.3 MB2017-08-09
Coding SequenceFASTA34.69 KB2017-08-09
transcriptome sequenceFASTA7.35 MB2017-08-09
TextTEXT18.88 MB2017-08-09
GitHub archivearchive1.93 KB2017-08-09
transcriptome sequenceFASTA93.37 MB2017-08-09
transcriptome sequenceFASTA95.29 MB2017-08-09
TextTEXT1.62 KB2017-08-09
TextTEXT405.62 KB2017-08-09
ReadmeTEXT3.78 KB2017-08-09
Displaying 1-10 of 10 File(s).



Other datasets you might like: