Help Login Create account

Data released on April 24, 2018

Software and supporting data for "Fast-SG: An alignment-free algorithm for hybrid assembly"

Di Genova, A; Ruz, G, A; Sagot, M, F; Maass, A (2018): Software and supporting data for "Fast-SG: An alignment-free algorithm for hybrid assembly" GigaScience Database. RIS BibTeX Text

Long read sequencing technologies are the ultimate solution for genome repeats, allowing near reference level reconstructions of large genomes. However, long read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods which combine short and long read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Here, we propose a new method, called Fast-SG, which uses a new ultra-fast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures.Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short read aligners when building the scaffolding graph, and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).

Contact Submitter

Additional information:


alignment-free scaffolding hybrid-genome-assembly hybrid assembly genome scaffolding scaffolding illumina nanopore pacbio 

Software, Genomic

Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
A. thaliana (Ler-0)3702mouse-ear cressthale cressArabidopsis thaliana Description:Representative sample of Arabidopsis thaliana sequences used for demonstration of sequence assembly tools
Sequencing method:Illumina, PacBio
Relevant electronic resources:,, r/m54113_160913_184949.subreads.bam,
E.coli K-1283333  Escherichia coli K-12 Description:Representative sample of Escherichia coli K-12 sequences used for demonstration of sequence assembly tools
Sequencing method:Illumina, PacBio, ONT
Relevant electronic resources:,, ,
Human Chromosome 149606HumanhumanHomo sapiens Description:Representative sample of Homo sapiens ...
Alternative accession-SRA File:ERR163027
Sequencing method:Illumina
NA128789606HumanhumanHomo sapiens Description:Representative sample of Homo sapiens sequences used for demonstration of sequence assembly tools
Sequencing method:ONT
Relevant electronic resources:,
P_falciparum 36329  Plasmodium falciparum 3D7 Description:Representative sample of Plasmodium fa...
Alternative accession-SRA File:ERR034295, ERR16302...
Sequencing method:Illumina
R_sphaeroides 1063  Rhodobacter sphaeroides Description:Representative sample of Rhodobacter s...
Alternative accession-SRA File:SRR034528
Sequencing method:Illumina
S_aureus 1280  Staphylococcus aureus Description:Representative sample of Staphylococcu...
Alternative accession-SRA File:SRR022865
Sequencing method:Illumina
S. cerevisiae W303580240  Saccharomyces cerevisiae W303 Description:Representative sample of Saccharomyces cerevisiae W303 sequences used for demonstration of sequence assembly tools
Sequencing method:PacBio
Relevant electronic resources:,
Displaying 1-8 of 8 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
Data Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
A. thaliana (Ler-0)Mixed archiveTAR615.47 MB2018-04-24
E.coli K-12Mixed archiveTAR10.41 MB2018-04-24
SoftwareTAR13.95 MB2018-03-30
Human Chromosome 14Mixed archiveTAR6.29 GB2018-04-24
MD5sumTEXT0.51 KB2018-04-24
NA12878Mixed archiveTAR751.11 MB2018-04-24
P_falciparum Mixed archiveTAR1.57 GB2018-04-24
readme.txtTEXT2.7 KB2018-04-24
R_sphaeroides Mixed archiveTAR307.36 MB2018-04-24
S_aureus Mixed archiveTAR68.09 MB2018-04-24
Displaying 1-10 of 11 File(s).



Other datasets you might like: