Projects and Tutorials

T-BioInfo Interface Virology
Poliovirus
Document Video

Virology Analysis Tutorial - Polio with CirSeq

User Ratings :

CirSeq is a biological method that was created to look at genetically diverse populations of RNA viruses. This method creates a circularized sequence with tri-repeats of the viral genome.

This allows for the next-generation sequencing errors to be distinguishable from the many mutations that occur as the virus replicates. If you are interested in knowing more about the biological method used for circular sequencing, please read: doi:10.1038/nprot.2014.118

Today, we are making a cirseq platform using a polio data set as used in the Nature publication

This particular data set: A single viral clone (Polio RNA virus) grown in in human cells (HeLa cells) at low multiplicity of infection for a series of seven viral passages.

The T-BioInfo Platform has a specially designed platform- specifically for the Cirseq data sets. Our virology platform has thus been separated into three major sections:

CirSeq section, this platform includes algorithms created by the Tauber institute, as indicated with the tau on each individual algorithm.  The second section is designed for non cirseq-normal NGS data; this includes sections such as Error Correction, Mapping on the genome and significance of  the mutations. Lastly is our “in development” stage of the platform, this is for the integration of virology data sets with proteomics and small molecules to allow the researcher to take their research into various analysis platforms.

This future section will allow researcher to see the biological picture on many levels such as RNA, protein-protein interactions, and posttranslational modifications. 

First we will go to the T-BioInfo front page- select the virology platform that is under the last column on the right (Data Integration and Modeling).  Next we will select cirseq under format NGS data, SE, ModelGenomeGTF, Polio,Human, Contrasting Groups.

The pipeline we will be creating will include the following algorithms:

As we have previously verified that this is cirseq data of high quality- First getRepeats to get circularized repeats from long reads, Symmetric/Nonsymmetric to get consensus repeats and align them against the reference virus genome. BinomBayes to get fitness of individual variants.  This finished creating your pipeline, select end, name the pipeline, and select files via NGS Data, then run pipeline. 

From the output files: Will get a Bionom_fitnessBR txt.

Additionally, to get counts of per position mutations across the reference genome- would need to use PerNucl,

To create this pipeline, one would need to use first is slicingQ which checks the percentage of circularized reads and gets a file of consensus of the generated tri-repeats, SeqAlignVirus for partial alignment of genome for consensus on virus genome, and then PerNucl. 

Once this analysis has completed, the research can start using both machine learning- such as Principal Component Analysis and visualization such as Layercake for further data analysis.