Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis

Name: Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis
SKU: PDPD9RP6FM4
Price: 10.3 CHF
Availability: InStock

Seien Sie der Erste, der dieses Produkt bewertet

CHF 10.30

Auf Lager

SKU

PDPD9RP6FM4

1 Verfügbar

Kostenloser Versand ab CHF 50

Geliefert zwischen Do., 09.04.2026 und Fr., 10.04.2026

Details

This PhD thesis provides novel solutions to major topics within the analysis of next-generation sequencing data, focusing on parallelization, scalability and reproducibility.

The analysis of next-generation sequencing (NGS) data is a major topic in bioinfor- matics: short reads obtained from DNA, the molecule encoding the genome of living organisms, are processed to provide insight into biological or medical questions. This thesis provides novel solutions to major topics within the analysis of NGS data, focusing on parallelization, scalability and reproducibility. The read mapping problem is to find the origin of the short reads within a given reference genome. We contribute the q-group index, a novel data structure for read mapping with particularly small memory footprint. The q-group index comes with massively parallel build and query algorithms targeted towards modern graphics processing units (GPUs). On top, the read mapping software PEANUT is presented, which outperforms state of the art read mappers in speed while maintaining their accuracy. The variant calling problem is to infer (i.e., call) genetic variants of individuals compared to a reference genome using mapped reads. It is usually solved in a Bayesian way. In this work, we show how to integrate filtering of variants into the calling with an algebraic approach and provide an intuitive solution for controlling the false discovery rate along with solving other challenges of variant calling like scaling with a growing set of biological samples. Depending on the research question, the analysis of NGS data entails many other steps, typically involving diverse tools, data transformations and aggregation of results. These steps can be orchestrated by workflow management. We present the general purpose workflow system Snakemake, which provides an easy to read domain-specific language for defining and documenting workflows. Snakemake provides an execution environment that allows to scale a workflow to available resources, including parallelization across CPU cores or cluster nodes, restricting memory usage or the number of available coprocessors like GPUs.

Autorentext
Johannes Köster is a computer scientist with a focus on algorithm engineering and data analysis in bioinformatics. Currently, he works as a Postdoctoral Research Fellow in the groups of Shirley Liu, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health and Myles Brown, Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute.

30 Tage Rückgaberecht

Weitere Informationen

Allgemeine Informationen
- GTIN 09783737537773
- Altersempfehlung 18 bis 18 Jahre
- Genre Psychology
- Größe H210mm x B148mm x T7mm
- Jahr 2015
- EAN 9783737537773
- Format Kartoniert
- ISBN 978-3-7375-3777-3
- Titel Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis
- Autor Johannes Köster
- Untertitel Dissertationsschrift
- Gewicht 179g
- Herausgeber epubli
- Anzahl Seiten 132

Bewertungen

Schreiben Sie eine Bewertung

Nur registrierte Benutzer können Bewertungen schreiben. Bitte loggen Sie sich ein oder erstellen Sie ein Konto.