Projects and Tutorials

RNA-Seq Analysis of Macrophages in mouse model Lung Cancer

User Ratings :

Download a copy of this presentation.

Project Title: Applying Factor Regression Analysis in a Mouse Model Study

The “omics” field in expanding rapidly, this is likely driven by the plummeting cost of sequencing and sequencers. However, the processing, analyzing, and interpreting the “omics” data can take months and require large teams. Pine Biotech’s technology; T-Bioinfo platform enables faster and easier analysis, integration, and visualization of Big Data sets for faster Discovery.

Pine Biotech Educational Goals:

The interaction of macrophage populations with the tumor/tumor microenvironment is a complex system that plays an important role in cancer progression. These cells are characterized by their ability to adapt and alter their phenotype in response to local environmental cues. Understanding how macrophages interact with the tumors from lung cancer is increasingly important as this cancer remains to be one of the leading causes of death in men and women(Poczobutt et al., 2016).

  1. Use of Factor Analysis to identify unique genes and isoforms affecting each macrophage population.
  2. Demonstrate the adaptability of the T-Bioinfo platform

Authors Study:

This study used a tissue transplant grafted into the same place as the tumor originated (orthotopic), thus lung cancer cells were placed directly into the left lobe of immunocompetent mice. They defined and removed several distinct populations of macrophages/monocytes at different time stages of progression using a multimarker cell sorter also known as flow cytometry. Next RNA-seq was used to define the distinct expression of each population and how this changed over time (tumor growth). Populations of cells that did not change in number or expression were the alveolar macrophage population. A second tumor associated macrophage population was found to increase dramatically with tumor growth and to express genes such as chemokines. The third population was identified as tumor associated monocytes, and expressed large number of genes in matrix remodeling, or the movement or restructuring of the extracellular matrix. The data underscores the complexity of monocytes and macrophages in the tumor microenvironment, and suggests that distinct populations play specific roles in tumor progression.

Machine Learning:

Principal Component Analysis:

Principal Component Analysis is a statistical method that shows multidimensional data in a lower dimensional space. Simply, it is a technique that is used to emphasize variation and bring out strong patterns in a data set.

Factor Analysis:

Factor Regression Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

cell type Weeks Cell Type Weeks
Mac A 0 Mac B2 2
Mac A 0 Mac B2 2
Mac A 0 Mac B2 2
Mac B1 0 Mac B3 2
Mac B1 0 Mac B3 2
Mac B1 0 Mac B3 2
Mac B2 0 Mac B2 3
Mac B2 0 Mac B2 3
Mac B2 0 Mac B2 3
Mac A 2 Mac B3 3
Mac A 2 Mac B3 3
Mac A 2 Mac B3 3

In order to use Factor Regression Analysis, factors on all levels need to be present for each sample. In this project, there are a number of time points and cell types that are not fully represented across all samples. Because there are no matching factors on all samples, we divided the project into 2 distinct groups that have 1 group of replicates that is overlapping. This factor analysis compared a combination of 0 weeks vs 2 weeks and Mac A vs Mac B2, alongside 2 weeks vs 3 weeks and MacB2 vs MacB3. Factor Analysis has allowed for a better identification of important factors than simple differential expression comparison.  The authors found they could determine different macrophages populations by a time course and identified differences between macrophage types and time courses.

Educational Data Sets:

Running a full pipeline on unfiltered samples can take a long time, and produce many additional results that are difficult for interpretation like unannotated genes and transcripts. To simplify the project for educational use, we took all the reads from all samples that aligned to a selection of significant and insignificant genes, and extracted them into a small FastQ file. On average, these files are 6.2% of the original size and take significantly less time to run (approx. 3 hours). Significant genes are selected from the original set through Factor Regression Analysis, and a select amount of insignificant genes are also selected to show the difference in significance to students.

Conclusions:

PPAR Signaling Pathway The authors reported a total of 16 PPAR signaling genes, our analysis identified all of the genes as the author as well as (insert number) more than their findings. Cytokine-Cytokine Receptor Pathway Additionally, using factor analysis, our team is able to report a higher enrichment of genes from the B3 cell cluster for the cytokine receptor pathway.

Bibliography:

Poczobutt, J. M., De, S., Yadav, V. K., Nguyen, T. T., Li, H., Sippel, T. R., Nemenoff, R. A. (2016). Expression profiling of macrophages reveals multiple populations with distinct biological roles in an immunocompetent orthotopic model of lung cancer. Journal of Immunology, 196(6), 2847–2859. http://doi.org/10.4049/jimmunol.1502364

Referenced files:

It's recommended to use the SVL files due to their small size, as they simply reference data already on the Pine Biotech servers. However, full copies of the educational dataset are also available.