Supporting data for "Bio-Docklets: Virtualization Containers for Single-Step Execution of NGS Pipelines."

Dataset type: Software
Data released on June 26, 2017

Kim B; Ali T; Lijeron C; Afgan E; Krampis K (2017): Supporting data for "Bio-Docklets: Virtualization Containers for Single-Step Execution of NGS Pipelines." GigaScience Database. http://dx.doi.org/10.5524/100323

DOI10.5524/100323

Processing of Next-Generation Sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized post-analysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers, towards seamless deployment of pre-configured bioinformatics software and pipelines on any computational platform.
We present an approach for abstracting the complex data operations of multi-step, bioinformatics pipelines for NGS data analysis. As examples, we have deployed two pipelines for RNAseq and CHIPseq, pre-configured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simple as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets, and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface (API). The pipeline output is post-processed by integration with the Visual Omics Explorer (VOE) framework, providing interactive data visualizations that users can access through a web browser.
Our goal is to enable easy access to NGS data analysis pipelines for non-bioinformatics experts, on any computing environment whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end-users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28854616)

Additional information:

https://tinyurl.com/run-BD

https://github.com/BCIL/BioDocklets/archive/master.zip

https://hub.docker.com/r/bcil/biodocklets/tags/

Accessions (data referenced by this study):

SRA: SRR1797219
SRA: SRR1797228
ENA: ERR411994





File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivearchive636.73 KB2017-06-19
ReadmeTEXT2.26 KB2017-06-19
Displaying 1-2 of 2 File(s).
Date Action
June 26, 2017 Dataset publish
October 2, 2017 Manuscript Link added : 10.1093/gigascience/gix048
November 9, 2022 Manuscript Link updated : 10.1093/gigascience/gix048