Bayesian Analysis of Transposon Mutagenesis Data

Motivation: Next-generation sequencing affords an efficient analysis of transposon insertion libraries, which can be used to identify essential genes in bacteria. To analyze this high-resolution data, we present a formal Bayesian framework for estimating the posterior probability of essentiality for each gene, using the Extreme Value distribution to characterize the statistical significance of the longest region lacking insertions within a gene. We describe a sampling procedure based on the Metropolis-Hastings algorithm to calculate posterior probabilities of essentiality while simultaneously integrating over unknown internal parameters.

Results: Using a sequence dataset from a transposon library for M. tuberculosis, we show that this Bayesian approach predicts essential genes that correspond well with genes shown to be essential in previous studies. Furthermore, we show that by using the Extreme Value Distribution to characterize genomic regions lacking transposon insertions, this method is capable of identifying essential domains within genes. This approach can be used for analyzing transposon libraries in other organisms, and augmenting essentiality predictions with statistical confidence scores.

Bayesian Analysis of Gene Essentiality based on Sequencing of Transposon Insertion Libraries Michael DeJesus; Yanjia J. Zhang; Christopher M. Sassetti; Eric J. Rubin; James C. Sacchettini; Thomas R. Ioerger Bioinformatics 2013; doi: 10.1093/bioinformatics/btt043

Contact Information

If you have any questions, contact us at: ioerger@cs.tamu.edu.

Introduction

The software available here is a python implementation of the Bayesian analysis method referenced above. It utilizes read information obtained from sequencing libraries of transposon mutants, to determine the essentiality of genes. Using a Bayesian framework, essentiality is modeled through the Extreme Value (Gumbel) distribution, which characterizes the maximum run of non-insertions (i.e. number of consecutive TA sites lacking insertion in a row). Genes with significantly larger runs of non-insertion thant statistically expected have a higher likelihood of essentiality. A Metropolis-Hastings sampling procedure is utilized to sample from conditional densities of essentiality for all genes, and posterior estimates of the probability of being essential are estimated for all genes.

Source Code

Source code is written in Python, and comes with a README document containing instructions.

Version History

Requirements:


Source code can be extracted by using the following command:

tar -xvzf mh_ess_1.00.tar.gz

Data

Example files are provided below to test the execution of the script and help verify that input files are in the appropriate format:

Copyright Information

The method and implementation provided in this website was created by Michael A. DeJesus and Thomas R. Ioerger and is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

If you wish to use this source code, please provide attribution by using the following citation:

Bayesian Analysis of Gene Essentiality based on Sequencing of Transposon Insertion Libraries
Michael DeJesus; Yanjia J. Zhang; Christopher M. Sassetti; Eric J. Rubin; James C. Sacchettini; Thomas R. Ioerger
Bioinformatics 2013; doi: 10.1093/bioinformatics/btt043



Creative Commons License
Creative Commons License
Attribution-NonCommercial 3.0 Unported
© Copyright 2012 - . Michael A. DeJesus & Thomas R. Ioerger.