Projects and Tutorials

Big Data Analysis ChIP-seq NGS Data
Arabidopsis thaliana
Document Video

ChIP-Seq Data Analysis – Arabidopsis thaliana

User Ratings :

In this tutorial we be using chIP-Seq on the T-BioInfo platform.

We’ll be getting out data from a research paper, “Natural variation of H3K27me3 distribution between two Arabidopsis accessions and its association with flanking transposable elements.”(1) In this paper we need to find the data they use to generate these results.

Near the bottom we can find the Primary Accession. We’re going to use the E-MTAB-1043 accession and search for it on Google.com. On Google, we can find ArrayExpress and we can browse the available files.

We need the samples. As you can see, there are several samples with the various genotypes. We’ll need to download them as FastQ files and extract them before returning to T-BioInfo and selecting chIP-SEQ. Once the platform opens, we need to close the help pop-up.

We’ll be using ModelGenomeGTF with the Arabidopsis thaliana genome we already have on file. Once again we’ll be using FastQ files. Let’s do a pair of groups. 1 vs. 1.

First, we’ll select Data Input and save our information. We’ll be using Bsklb-GPU and BroadPeak. We’ll click Data Output and click Save once again.

Now we’ll need to upload our data. We’ll be uploading those FastQ files we downloaded and extracted earlier then we’ll click upload and finally Run Pipeline.

The platform will bring up your current Pipeline and show you it’s status. To check how it is doing you can either refresh the page or click the ‘My Pipelines’ link at the top of the page. There is no need to watch however, as the platform will email you when your Pipeline is complete.

Now that your Pipeline is complete, lets review the results. We have NGS data, which is those fastQ files as well as NimbleGen and Affymetrix data. According to a variety of variables with the author data and our data. What that means is we can have a variety of results. This is significant for reasons that are still under investigation.

We use Arabidopsis because it has a small size genome, a short generation time, a large number of offspring and a small size of plant. There is a large variety of data available as well. We can Methylated histone tail positions in Arabidopsis and a variety of other information.

I hope you found this tutorial helpful, and thank you for your time.

(1)Dong X, Reimer J, Göbel U, et al. Natural variation of H3K27me3 distribution between two Arabidopsis accessions and its association with flanking transposable elements. Genome Biology. 2012;13(12):R117. doi:10.1186/gb-2012-13-12-r117.