MuSiC icon indicating copy to clipboard operation
MuSiC copied to clipboard

Cell types not found

Open Zanvine-gitcode opened this issue 4 years ago • 0 comments

Hello,

Firstly, thanks for this analysis tool. So far I find it pretty intuitive to use & helpful.

I stumbled across something I thought a bit odd, was hoping you might be able to help me.

I used a mixture of 4 to 5 sc-RNAseq datasets with cell-types 'T-cell', 'Fibroblast', 'Macrophage', 'Endothelial', 'CAF' & 'Epithelial' and the transcriptomes are pretty similar across the datasets. I make the ExpressionSet object for my sc-RNAseq datasets and 2 bulk tissue RNA datasets, one of them I got using the TCGAbiolinks package on R (I mention is because its the odd one).

I use the following Est.1 <- music_prop(bulk.eset = bt_data, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)

and everything has proportions as expected. (1st plot)

jitter_estproportions_wShih.pdf

But I try with the second bt_data set, and I lose all of my T-cells? Est.2 <- music_prop(bulk.eset = bt_data_2nd, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)

jitter_noTcells.pdf

I checked the bt_data_2nd matrix and there are definitely T-cell markers present. If I remove one of the datasets from the sc_data and rerun Est.3 <-music_prop(bulk.est = bt_data_2nd, sc.est = sc_data.minus1, cluster = 'Cell-type', samples = "SampleID", verbose = T)

The NNLS seems to find T-cells, but not MuSiC. jitter_NNLS_tcells.pdf

My sc-RNAseq datasets are usually processed as Seurat objects, so I pulled T-cell markers across all sc-RNAseq datasets and they're definitely in the bt_data_2nd (TCGA bulk RNAseq dataframe). So I don't understand why I am getting flat zeroes for T-cells.

bt_data (the one that had all cell-types afte deconvolution) ExpressionSet (storageMode: lockedEnvironment) assayData: 13104 features, 548 samples element names: exprs protocolData: none phenoData sampleNames: TCGA.20.0987 TCGA.23.1031 ... TCGA.13.1819 (548 total) varLabels: EPCAM PTPRC ... VWF (7 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation:

bt_data_2nd (the one that has no T-cells, apparently) ExpressionSet (storageMode: lockedEnvironment) assayData: 56537 features, 229 samples element names: exprs protocolData: none phenoData sampleNames: TCGA-04-1331-01A-01R-1569-13 TCGA-04-1332-01A-01R-1564-13 ... TCGA-WR-A838-01A-12R-A406-31 (229 total) varLabels: Sample.ID Definition ... sampleNames (5 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation:

my sc-RNAseq datasets ExpressionSet (storageMode: lockedEnvironment) assayData: 22390 features, 38789 samples element names: exprs protocolData: none phenoData sampleNames: E27_Peri_AAACCCAAGACGCCAA E27_Peri_AAACCCAAGAGTCAGC ... Shih_ctcaatgtcggcaccttc (38789 total) varLabels: Cell-type SampleID varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation:

Just to show that my second bulk-RNAseq dataset does indeed include T-cell markers. I intersected the marker genes from all cell-types using seurat across all my sc-RNAseq datasets. leaving me with vectors containing marker genes for each cell-type that overlap across all the sc-RNAseq datasets.

a quick glance shows these genes are present and have expression values. (T-cells markers) TCGA-61-1724-01A-01R-1568-13 TCGA-61-1736-01B-01R-1568-13 IL32 3266 14180 PTPRC 465 1027 NKG7 141 365 HCST 583 464 TCGA-61-1738-01A-01R-1567-13 TCGA-61-1741-01A-02R-1567-13 IL32 2443 18459 PTPRC 755 1139 NKG7 180 317 HCST 784 248 TCGA-61-1918-01A-01R-1568-13 TCGA-61-1919-01A-01R-1568-13 IL32 3651 14568 PTPRC 1164 4614 NKG7 76 2053 HCST 187 294 TCGA-61-2101-01A-01R-1568-13 TCGA-61-2102-01A-01R-1568-13 IL32 4881 6006 PTPRC 2574 1441 NKG7 870 362 HCST 482 291 TCGA-61-2109-01A-01R-1568-13 TCGA-61-2110-01A-01R-1568-13 IL32 5623 2866 PTPRC 1422 725 NKG7 580 717 HCST 354 769 TCGA-61-2113-01A-01R-1568-13 TCGA-VG-A8LO-01A-11R-A406-31 IL32 572 2747 PTPRC 147 320 NKG7 176 497 HCST 124 489

Much the same for the marker genes of the other cell types.

I'd appreciate any help or suggestions as to why I might be getting these results.

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default BLAS: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRblas.so LAPACK: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] reshape2_1.4.3 xbioc_0.1.17 AnnotationDbi_1.46.1 IRanges_2.18.2
[5] S4Vectors_0.22.1 Seurat_3.1.0 MuSiC_0.1.1 ggplot2_3.2.1
[9] nnls_1.4 Biobase_2.44.0 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] Rtsne_0.15 colorspace_1.4-1 ggridges_0.5.1 rstudioapi_0.10
[5] leiden_0.3.1 listenv_0.7.0 npsurv_0.4-0 MatrixModels_0.4-1 [9] bit64_0.9-7 ggrepel_0.8.1 codetools_0.2-16 splines_3.6.0
[13] R.methodsS3_1.7.1 lsei_1.2-0 zeallot_0.1.0 jsonlite_1.6
[17] mcmc_0.9-6 ica_1.0-2 cluster_2.1.0 png_0.1-7
[21] R.oo_1.22.0 uwot_0.1.3 sctransform_0.2.0 BiocManager_1.30.4 [25] compiler_3.6.0 httr_1.4.1 backports_1.1.4 assertthat_0.2.1
[29] Matrix_1.2-17 lazyeval_0.2.2 htmltools_0.3.6 quantreg_5.51
[33] tools_3.6.0 rsvd_1.0.2 igraph_1.2.4.1 coda_0.19-3
[37] gtable_0.3.0 glue_1.3.1 RANN_2.6.1 dplyr_0.8.3
[41] Rcpp_1.0.2 vctrs_0.2.0 gdata_2.18.0 ape_5.3
[45] nlme_3.1-140 gbRd_0.4-11 lmtest_0.9-37 stringr_1.4.0
[49] globals_0.12.4 lifecycle_0.1.0 irlba_2.3.3 gtools_3.8.1
[53] future_1.14.0 MASS_7.3-51.4 zoo_1.8-6 scales_1.0.0
[57] SparseM_1.77 RColorBrewer_1.1-2 yaml_2.2.0 memoise_1.1.0
[61] reticulate_1.13 pbapply_1.4-1 gridExtra_2.3 pkgmaker_0.28
[65] stringi_1.4.3 RSQLite_2.1.2 caTools_1.17.1.2 bibtex_0.4.2
[69] Rdpack_0.11-0 SDMTools_1.1-221.1 rlang_0.4.0 pkgconfig_2.0.2
[73] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7 purrr_0.3.2
[77] labeling_0.3 htmlwidgets_1.3 bit_1.1-14 cowplot_1.0.0
[81] tidyselect_0.2.5 RcppAnnoy_0.0.12 plyr_1.8.4 magrittr_1.5
[85] R6_2.4.0 gplots_3.0.1.1 DBI_1.0.0 pillar_1.4.2
[89] withr_2.1.2 fitdistrplus_1.0-14 survival_2.44-1.1 tibble_2.1.3
[93] future.apply_1.3.0 tsne_0.1-3 crayon_1.3.4 KernSmooth_2.23-15 [97] plotly_4.9.0 grid_3.6.0 data.table_1.12.2 blob_1.2.0
[101] metap_1.1 digest_0.6.20 xtable_1.8-4 tidyr_1.0.0
[105] MCMCpack_1.4-4 R.utils_2.9.0 RcppParallel_4.4.3 munsell_0.5.0
[109] registry_0.5-1 viridisLite_0.3.0

Zanvine-gitcode avatar Sep 16 '19 16:09 Zanvine-gitcode