BayesPrism
BayesPrism copied to clipboard
Principle of separation of types?
Hello, I'm currently using BayesPrism for deconvolution and I have a question.
I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not.
I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal.
The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion.
Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis.
By the way, the result of the first method is similar to CIBERSORT, but the second method is quite different
Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples?
On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.***> wrote:
Hello, I'm currently using BayesPrism for deconvolution and I have a question.
I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not.
I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal.
The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion.
Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis.
— Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples? … On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.> wrote: Hello, I'm currently using BayesPrism for deconvolution and I have a question. I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not. I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal. The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion. Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis. — Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 . You are receiving this because you are subscribed to this thread.Message ID: @.>
Thanks for your reply! My input data consists of:
- Single-cell RNA sequencing data: 40 samples, including 30,000 normal cells and 100,000 cancer cells.
- Bulk RNA sequencing data: Obtained from TCGA, including 350+ cancer samples and 40+ normal samples.
I utilized the LIGER package for semi-supervised data dimensionality reduction and the Seurat package's FindClusters function for clustering. This resulted in the identification of over 30 subclusters. Upon examining the composition of these subclusters in terms of Tumor and Normal, I discovered that more than half of the subclusters were exclusively present in either Tumor or Normal. Consequently, I merged the subclusters exclusive to Tumor or Normal into two types, despite the possibility of dissimilar expression profiles between the subclusters distributed in Normal or Tumor. I set the key as 'Tumor'.
My current approach involves conducting two rounds of BayesPrism analysis. In the first round, I include both Tumor and Normal in the type definition. After deconvolution, I analyze whether the theta values of the types show significant differences between cancer and adjacent tissue in the bulk data. Upon identifying significant differences, I proceed with the second round of deconvolution, using only the subclusters from Tumor and Normal. However, I set their types based on the original cell types. I then analyze the theta values of the type results and perform single-factor Cox survival analysis to select major subclusters associated with survival for further analysis.
Do you mind if sending me a table of cell.type.labels and cell.state.labels (if cell.state.labels differ from cell.type.labels) using something like table(data.frame(cell.type.labels, cell.state.labels)), for both the first round and second round of deconvolution? Thanks.
On Wed, Nov 22, 2023 at 5:18 PM Lao-Tz @.***> wrote:
Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples? … <#m_-4686331678703494017_> On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.> wrote: Hello, I'm currently using BayesPrism for deconvolution and I have a question. I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not. I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal. The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion. Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis. — Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0 https://github.com/Danko-Lab/BayesPrism/issues/65, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 https://github.com/notifications/unsubscribe-auth/AB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ . You are receiving this because you are subscribed to this thread.Message ID: @.>
Thanks for your reply! My input data consists of:
- Single-cell RNA sequencing data: 40 samples, including 30,000 normal cells and 100,000 cancer cells.
- Bulk RNA sequencing data: Obtained from TCGA, including 350+ cancer samples and 40+ normal samples.
I utilized the LIGER package for semi-supervised data dimensionality reduction and the Seurat package's FindClusters function for clustering. This resulted in the identification of over 30 subclusters. Upon examining the composition of these subclusters in terms of Tumor and Normal, I discovered that more than half of the subclusters were exclusively present in either Tumor or Normal. Consequently, I merged the subclusters exclusive to Tumor or Normal into two types, despite the possibility of dissimilar expression profiles between the subclusters distributed in Normal or Tumor. I set the key as 'Tumor'.
My current approach involves conducting two rounds of BayesPrism analysis. In the first round, I include both Tumor and Normal in the type definition. After deconvolution, I analyze whether the theta values of the types show significant differences between cancer and adjacent tissue in the bulk data. Upon identifying significant differences, I proceed with the second round of deconvolution, using only the subclusters from Tumor and Normal. However, I set their types based on the original cell types. I then analyze the theta values of the type results and perform single-factor Cox survival analysis to select major subclusters associated with survival for further analysis.
— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/65#issuecomment-1822384664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYRIM27APYLXAQGDZDYFW7MTAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGM4DINRWGQ . You are receiving this because you commented.Message ID: @.***>
# Extracting the 'minor_cluster' and 'group' columns
minor_cluster <- [email protected]$minor_cluster
group <- [email protected]$group
# Creating a table that lists the count of 'minor_cluster' in each group
cluster_table <- table(minor_cluster, group)
# Finding the 'minor_cluster' with a count of 0 in the 'Normal' and 'Tumor' groups
tumor <- row.names(cluster_table)[cluster_table[, "Normal"] == 0]
normal <- row.names(cluster_table)[cluster_table[, "Tumor"] == 0]
# Setting the corresponding 'major_cluster' and 'minor_cluster' of these clusters as "Tumor Cells" and "Normal Cells"
[email protected]$major_cluster[[email protected]$minor_cluster %in% tumor] <- "Tumor Cells"
#[email protected]$major_cluster[[email protected]$minor_cluster %in% normal] <- "Normal Cells"
This code does not incorporate Normal Cells, because this code was intercepted in my current working environment. It will be run when BayesPrism is run, so the major_cluster of the following data does not contain Normal Cells.
# first round
> table(sce$minor_cluster,sce$group)
Normal Tumor
EE1 983 0
EG1 1705 2248
EG2 601 1524
EG3 683 13
EV1 2413 0
EV2 0 1183
EV3 0 31
GC1 1381 2216
LB1 3689 97
LB2 0 2901
LB3 1330 0
LB4 0 1184
LB5 0 2788
LB6 243 0
LB7 0 135
LT1 4011 1501
LT2 2118 2585
LT3 0 3067
LT4 0 1230
LT5 139 603
LT6 242 0
LT7 150 0
LT8 0 118
MM1 1471 358
MM2 0 973
MM3 0 544
MN1 1050 473
MY1 558 443
NN1 1346 0
SC1 807 0
SC2 0 307
SF1 3234 0
SM1 0 1110
TT1 0 535
> table(sce$major_cluster,sce$group)
Normal Tumor
Endocrine Cells 1346 0
Endothelial Cells 2413 0
Epithelial Cells 5353 6001
Lymphocytes 11922 4786
Myeloid Cells 2521 831
Stromal Cells 4599 443
Tumor Cells 0 16106
> table(sce$major_cluster,sce$minor_cluster)
EE1 EG1 EG2 EG3 EV1 EV2 EV3 GC1 LB1 LB2 LB3 LB4 LB5 LB6 LB7 LT1 LT2 LT3 LT4 LT5 LT6 LT7 LT8 MM1 MM2 MM3 MN1 MY1 NN1 SC1 SC2 SF1 SM1 TT1
Endocrine Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1346 0 0 0 0 0
Endothelial Cells 0 0 0 0 2413 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Epithelial Cells 983 3953 2125 696 0 0 0 3597 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Lymphocytes 0 0 0 0 0 0 0 0 3786 0 1330 0 0 243 0 5512 4703 0 0 742 242 150 0 0 0 0 0 0 0 0 0 0 0 0
Myeloid Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1829 0 0 1523 0 0 0 0 0 0 0
Stromal Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1001 0 807 0 3234 0 0
Tumor Cells 0 0 0 0 0 1183 31 0 0 2901 0 1184 2788 0 135 0 0 3067 1230 0 0 0 118 0 973 544 0 0 0 0 307 0 1110 535
#second round (Another Rscript)
Idents(sce) <- "minor_cluster"
NT_keep = table(sce$minor_cluster,sce$group) %>% as.data.frame() %>% filter(Freq == 0) %>% select(Var1)
sce <- subset(sce, idents = NT_keep$Var1)
My Tumor subgroup was sampled by layers, and then merged manually according to the number of cells. My subgroup annotation is based on the first 50 genes of the FindAllMarkers function in seurat package, and some of them may be able to see what cell type it is just by looking at the Top 10 or even the Top 5 genes. I am a novice in the analysis of single cell sequencing data, and I have always wondered why everyone can annotate tumor cells when it is clear that they are all expression states of tumor microenvironment cells. Thanks!
major_cluster | minor_cluster1 | minor_cluster2 |
---|---|---|
Lymphocytes | T cells | LT1 |
Lymphocytes | T cells | LT2 |
Lymphocytes | B cells | LB1 |
Epithelial Cells | Gastric Endocrine Cells | EG1 |
Lymphocytes | T cells | LT3 |
Epithelial Cells | Gastric Endocrine Cells | EG2 |
Stromal Cells | Fibroblasts | SF1 |
Endothelial Cells | Vascular Endothelial Cells | EV1 |
Myeloid Cells | Macrophages | MM1 |
Lymphocytes | B cells | LB2 |
Myeloid Cells | Neutrophils | MN1 |
Lymphocytes | B cells | LB3 |
Epithelial Cells | Gastric Chief Cells | GC1 |
Lymphocytes | B cells | LB4 |
Lymphocytes | B cells | LB5 |
Stromal Cells | Mast Cells | SM1 |
Lymphocytes | T cells | LT4 |
Myeloid Cells | Macrophages | MM2 |
Stromal Cells | Myofibroblasts | MY1 |
Stromal Cells | Cancer-associated fibroblasts (CAFs) | SC1 |
Lymphocytes | T cells | LT5 |
Epithelial Cells | Epithelial Cells | EE1 |
Endocrine Cells | Neuroendocrine Cells | NN1 |
Lymphocytes | T cells | LT6 |
Lymphocytes | B cells | LB6 |
Lymphocytes | T cells | LT7 |
Epithelial Cells | Gastric Endocrine Cells | EG3 |
Endothelial Cells | Vascular Endothelial Cells | EV2 |
Tumor Cells | Tumor Cells | TT1 |
Myeloid Cells | Monocytes | MM3 |
Stromal Cells | Cancer-associated fibroblasts (CAFs) | SC2 |
Lymphocytes | B cells | LB7 |
Lymphocytes | T cells | LT8 |
Endothelial Cells | Vascular Endothelial Cells | EV3 |
When you say "Tumor" and "Normal", do you mean tumor samples and normal samples, rather than malignant and non-malignant cells? I am asking as I saw even lymphocytes show up in both groups.
On Fri, Nov 24, 2023 at 9:47 PM Lao-Tz @.***> wrote:
Extracting the 'minor_cluster' and 'group' columnsminor_cluster <- @.$minor_clustergroup <- @.$group
Creating a table that lists the count of 'minor_cluster' in each groupcluster_table <- table(minor_cluster, group)
Finding the 'minor_cluster' with a count of 0 in the 'Normal' and 'Tumor' groupstumor <- row.names(cluster_table)[cluster_table[, "Normal"] == 0]normal <- row.names(cluster_table)[cluster_table[, "Tumor"] == 0]
Setting the corresponding 'major_cluster' and 'minor_cluster' of these clusters as "Tumor Cells" and "Normal @.@.$minor_cluster %in% tumor] <- "Tumor @.@.$minor_cluster %in% normal] <- "Normal Cells"
first round> table(sce$minor_cluster,sce$group)
Normal Tumor
EE1 983 0 EG1 1705 2248 EG2 601 1524 EG3 683 13 EV1 2413 0 EV2 0 1183 EV3 0 31 GC1 1381 2216 LB1 3689 97 LB2 0 2901 LB3 1330 0 LB4 0 1184 LB5 0 2788 LB6 243 0 LB7 0 135 LT1 4011 1501 LT2 2118 2585 LT3 0 3067 LT4 0 1230 LT5 139 603 LT6 242 0 LT7 150 0 LT8 0 118 MM1 1471 358 MM2 0 973 MM3 0 544 MN1 1050 473 MY1 558 443 NN1 1346 0 SC1 807 0 SC2 0 307 SF1 3234 0 SM1 0 1110 TT1 0 535
table(sce$major_cluster,sce$group)
Normal Tumor
Endocrine Cells 1346 0 Endothelial Cells 2413 0 Epithelial Cells 5353 6001 Lymphocytes 11922 4786 Myeloid Cells 2521 831 Stromal Cells 4599 443 Tumor Cells 0 16106
table(sce$major_cluster,sce$minor_cluster)
EE1 EG1 EG2 EG3 EV1 EV2 EV3 GC1 LB1 LB2 LB3 LB4 LB5 LB6 LB7 LT1 LT2 LT3 LT4 LT5 LT6 LT7 LT8 MM1 MM2 MM3 MN1 MY1 NN1 SC1 SC2 SF1 SM1 TT1
Endocrine Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1346 0 0 0 0 0 Endothelial Cells 0 0 0 0 2413 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Epithelial Cells 983 3953 2125 696 0 0 0 3597 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Lymphocytes 0 0 0 0 0 0 0 0 3786 0 1330 0 0 243 0 5512 4703 0 0 742 242 150 0 0 0 0 0 0 0 0 0 0 0 0 Myeloid Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1829 0 0 1523 0 0 0 0 0 0 0 Stromal Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1001 0 807 0 3234 0 0 Tumor Cells 0 0 0 0 0 1183 31 0 0 2901 0 1184 2788 0 135 0 0 3067 1230 0 0 0 118 0 973 544 0 0 0 0 307 0 1110 535
#second round (Another Rscript) Idents(sce) <- "minor_cluster"NT_keep = table(sce$minor_cluster,sce$group) %>% as.data.frame() %>% filter(Freq == 0) %>% select(Var1)sce <- subset(sce, idents = NT_keep$Var1)
My Tumor subgroup was sampled by layers, and then merged manually according to the number of cells. My subgroup annotation is based on the first 50 genes of the FindAllMarkers function in seurat package, and some of them may be able to see what cell type it is just by looking at the Top 10 or even the Top 5 genes. I am a novice in the analysis of single cell sequencing data, and I have always wondered why everyone can annotate tumor cells when it is clear that they are all expression states of tumor microenvironment cells. Thanks!
major_cluster minor_cluster1 minor_cluster2 Lymphocytes T cells LT1 Lymphocytes T cells LT2 Lymphocytes B cells LB1 Epithelial Cells Gastric Endocrine Cells EG1 Lymphocytes T cells LT3 Epithelial Cells Gastric Endocrine Cells EG2 Stromal Cells Fibroblasts SF1 Endothelial Cells Vascular Endothelial Cells EV1 Myeloid Cells Macrophages MM1 Lymphocytes B cells LB2 Myeloid Cells Neutrophils MN1 Lymphocytes B cells LB3 Epithelial Cells Gastric Chief Cells GC1 Lymphocytes B cells LB4 Lymphocytes B cells LB5 Stromal Cells Mast Cells SM1 Lymphocytes T cells LT4 Myeloid Cells Macrophages MM2 Stromal Cells Myofibroblasts MY1 Stromal Cells Cancer-associated fibroblasts (CAFs) SC1 Lymphocytes T cells LT5 Epithelial Cells Epithelial Cells EE1 Endocrine Cells Neuroendocrine Cells NN1 Lymphocytes T cells LT6 Lymphocytes B cells LB6 Lymphocytes T cells LT7 Epithelial Cells Gastric Endocrine Cells EG3 Endothelial Cells Vascular Endothelial Cells EV2 Tumor Cells Tumor Cells TT1 Myeloid Cells Monocytes MM3 Stromal Cells Cancer-associated fibroblasts (CAFs) SC2 Lymphocytes B cells LB7 Lymphocytes T cells LT8 Endothelial Cells Vascular Endothelial Cells EV3
— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/65#issuecomment-1825700843, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS3LDLRKLBHQZ2ZVXJ3YGCQOBAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVG4YDAOBUGM . You are receiving this because you commented.Message ID: @.***>
My "Tumor" and "Normal" here are the markers of "Cancer" and "Adjacent tissues" in the original data. I don't have enough experience to distinguish malignant and non-malignant cells, or I don't know how everyone does it, because I am the only one in our laboratory who is groping for single cell sequencing analysis.
Through pie chart, I observed the distribution of subgroups after dimensionality reduction of LIGER package clustering and FindClusters function, and tried to choose the parameters with the greatest difference between cancer and adjacent cancer, which resulted in lymphocytes and others appearing in "Tumor" and "Normal".
Therefore, in the case that the state subgroup only distributed in "Tumour" and "Normal" accounts for almost half, I consider extracting the state subgroup only distributed in "Tumour" and "Normal" and merging it into the type subgroup, and I don't set the key to run BayesPrism. I think it is still convincing.
Subsequently, I intend to use the type subgroup screened from here for Monocle and iTalk analysis, run WGCNA on the results of state and CIBERSORT, select the results with better results, intersect the above processes to find the key prognostic genes and build a gene model, which completes my exploration of single cell data at this stage.
Can you give me some advice for a beginner? Thank you for your reply!
I found my problem. There are so many zero values because my merge function doesn't match. It's over. I have to do it again.
I used copyKAT to find that the effect was not very good, so I used endothelial cells as annotations_file to run inferCNV and found that half of the epithelial cell subsets were obviously malignant, but this was far from the number of malignant cells in BayesPrism's paper. I found that my scRNA data has a lot of lymphocytes after dimensionality reduction clustering, and the lymphocytes have TCR or BCR copy number variation, and the lymphocytes in the cancer I studied do not seem to be malignant. So I still have doubts about how this type data should be constructed.