#### Deconvolution

Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina

## Questions

• ### Q1: What is SCISSOR-TM?

S ingle C ell I nferred S ite S pecific O mics R esources for T umor M icroenvironments Single Cell Inferred Site Specific Omics Resource for Tumor Microenvironments(SCISSOR-TM) is an online open resource that combines the large scale of TCGA bulk tumor multi-omics measurement and high-resolution of single cell transcriptomic studies using multiple existing deconvolution methods to infer the abundance of cell type-specific expression profile in heterogeneous samples and to investigate the correlation between site-specific cell types and different omics.

SCISSOR-TM provides 5 major analysis modules including overview module, survival module, molecular-cell type correlation module, genome-wide association module, and deconvolution module, allowing users to explore the interaction between cell type and a wide-spectrum of factors.

• ### Q2: How to use SCISSOR-TM

#### 1) Parameter input

Cancertype: Select a cancer type of interest, abbreviation is from TCGA database.

Single-cell data: Select the single-cell data which will be applied to deconvolution.

Deconvolution methods: Select the deconvolution method to estimate cell type proportion. Deconvolution is an algorithm-based process that can estimate the cell-type proportions by aligning bulk to cell-type specific expression profiles. The basic idea of deconvolution is to solve g in convolution equation: $$f \times{g} = h$$ , where h indicates the single-cell sequencing data, f is the bulk sequencing data, g is the function of mixed cell proportion of the bulk sequencing data. Utilizing different methods such as non-negative least square or support vector regression, g could be estimated so that we can calculate the mixed cell proportion of every bulk sample, which is easier to be measured or more accessible than scRNA-seq data. Temporarily only MuSiC and cibersortx are available.

Tumor purity estimator: Select the way to estimate tumor purity. Which will be used to adjust tumor purity in next modules. SingleCell_Estimation is estimated by single-cell data if cancer cell is available in it.

#### 2) Result visualization

Heatmap is applied to show the distribution of deconvolution result of each sample in TCGA database.

Boxplot is used to demonstrate the proportion of each cell type.

#### 1) Parameter input

Gene: optional in this module

Mapped miRNA/mapped protein: if gene is selected, mapped miRNA and mapped protein will be available to be put into model as quantitative variable.

Mapped mutation/mapped cnv: if gene is selected, mutation and copy number variation of the gene will be available to be put into the model.

Cell type: the deconvolution result, users can choose certain cell types to adjust in model, and the kaplan-meier curves will be drawn.

Covariance: covariances including gender, age, race and so on available in TCGA database. Tumor purity is estimated as selected before.

Percentage of patients: A slider bar between 0 to 50%, it defines the division of patient in quantitative variables. The default number is 50%, which means patients will be divided by median. The smaller the percentage, the bigger difference there may be in the variant.

After all variables are set, press Survival Plot to update the result.

#### 2) Result visualization

The result of multivariate cox model is shown to evaluate the prognostic value of TME cell proportion, multi-omics, and other clinical co-variates: $$ln(\frac{\lambda(t)}{\lambda_{0}(t)}) = \beta p+\gamma x+\delta z$$ , where the independent variable is the survival function with time and status, while p indicates estimated cell type proportion and x indicates different omics data including mRNA, miRNA, protein, mutation, and CNV. Estimated TME cell proportion, clinical features, and tumor purity can also be modeled as covariates z.

Kaplan-meier plots are drawn to show the difference in groups by variables. Quantitative data is divided by median or quartile set before. P-value of log-rank test is shown as well.

#### 1) Parameter input

Gene and cell types are required in this module.

#### 2) Result visualization

For quantitative omics including mRNA, miRNA and protein, scatterplot is used to show the correlation between cell types and omics data including miRNA, gene expression and protein. Correlation is calculated and p-value is shown on the plot. We applied ppcor package to adjust for tumor purity, and to demonstrate the adjusted correlation coefficients visually using scatterplots, we separate the effect size of tumor purity from every sample as follows: $$y' = y - \beta x$$ , where y indicates the quantitative expression of mRNA, miRNA or protein, and x is the tumor purity of the individuals. beta is estimated by multivariate linear regression, $$y = \beta x+\gamma z+ \epsilon$$ , where x is the tumor purity and z is the proportion of one of the cell types. By such calculation, we can come to y^' that represent the quantitative expression after adjusting for tumor purity. Tumor purity adjusted scatter plots could be drawn based on the adjusted quantitative expression.

For qualitative omics including mutation and CNV, we implement multiple linear regression and calculate the adjusted effect size, CI, and p-value for qualitative omics. We model: $$p = \gamma x+\delta z$$ , where p is TME cell type proportion as the response variable and x indicates omics data. We recoded mutation to 0 and 1 for wild-type and mutated status, and CNV status to -1, 0, 1 for deletion, normal, and duplication. z stands for the tumor purity or clinical features as covariates to adjust for.

Box plots were used to visualize the distribution of genomic aberrations (mutation and CNV) in different TME cells. With the adjusted effect size calculated from the multiple linear regression, we applied dot plots with confidence interval, where every dot indicated the effect size of mutation or CNV adjusting for tumor purity.

#### 1) Parameter input

In cell type tab, if users input All as default, a heat map will come out in which the x-axis is cell type while y-axis is top 200 most correlated genes in database (top 100 most positively correlated genes + top 100 most negatively correlated genes). The value is the correlation.

#### 2) Result visualization

Once a certain cell type is decided, a density plot will be demonstrated to show the distribution of correlation value with every genes, and the mean correlation value.

Top 200 correlated genes are shown below and can be downloaded as well.

#### Parameter input

Users can upload their own data in the form of the example data. And choose the deconvolution reference cancer type, single-cell data and deconvolution method, they can provide their email and the result will be sent to the address once the background calculation completed. If something wrong happen to the upload data, the warning will also be sent, you can contact us to find possible solution to it.

We can also perform deconvolution for users. Users can upload their own bulk data to our website in the form of txt. The data format should be in line with the example data.