diffcyt Compare only 2 out of many levels

Hello, I have a feature which contains 8 levels: "aTNF_LPMC" "aINT_LPMC" "aIL_LPMC" "aJAK_LPMC" "5ASA_LPMC" "Steroids_LPMC" "6MP_LPMC" "HC_LPMC".

Using contrast <- createContrast(c(0, 0, 0, 0, 0, 0, 0, 1)) it would compare my healthy controls to all other groups combined. How can I compare level number 8 to each of the other levels seperately?

Thanks so much for your help!

Jun 01 '20 20:06 LenaMayer

Hi again, learning from your explaination in "testDA_edgeR returns NA for some clusters" we tried out the -1 and 1 option for createContrast. However, our output is NA. Why is that? Please see attached the options we tried. I'm looking forward to your comments. Thank you!

contrast <- createContrast(c(0, 0, 1, 0, 0, 0, 0, -1)) da_res1 <- diffcyt(sce, design = DesignMatrix,

```
               contrast = contrast,
```

               analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",

               clustering_to_use = "meta10", verbose = TRUE)

using SingleCellExperiment object from CATALYST as input using cluster IDs from clustering stored in column 'meta10' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST

calculating features... calculating DA tests using method 'diffcyt-DA-GLMM'...

names(da_res1) [1] "res" "d_counts" [3] "d_medians" "d_medians_by_cluster_marker" [5] "d_medians_by_sample_marker" rowData(da_res1$res) DataFrame with 10 rows and 3 columns cluster_id p_val p_adj 1 1 NA NA 2 2 NA NA 3 3 NA NA 4 4 NA NA 5 5 NA NA 6 6 NA NA 7 7 NA NA 8 8 NA NA 9 9 NA NA 10 10 NA NA table(rowData(da_res1$res)$p_adj < FDR_cutoff) < table of extent 0 >

contrast <- createContrast(c(0, 0, 1, 0, 0, 0, 0, 0)) da_res1 <- diffcyt(sce, design = DesignMatrix,

```
               contrast = contrast,
```

               analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",

               clustering_to_use = "meta10", verbose = TRUE)

using SingleCellExperiment object from CATALYST as input using cluster IDs from clustering stored in column 'meta10' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST

calculating features... calculating DA tests using method 'diffcyt-DA-GLMM'...

names(da_res1) [1] "res" "d_counts" [3] "d_medians" "d_medians_by_cluster_marker" [5] "d_medians_by_sample_marker" rowData(da_res1$res) DataFrame with 10 rows and 3 columns cluster_id p_val p_adj 1 1 NA NA 2 2 NA NA 3 3 NA NA 4 4 NA NA 5 5 NA NA 6 6 NA NA 7 7 NA NA 8 8 NA NA 9 9 NA NA 10 10 NA NA table(rowData(da_res1$res)$p_adj < FDR_cutoff) < table of extent 0 >

Jun 02 '20 22:06 LenaMayer

Hi, did the suggestions Helena provided in the CATALYST repository help?

As Helena suggested, one idea would be to also try one of the other DA methods, e.g. method_DA = "diffcyt-DA-edgeR".

If this doesn't work, I would suggest that you work through the pipeline step by step, and inspect the output objects at each point. This will help you find out where the problem is starting (e.g. something to do with your expression matrix, or metadata table, etc).

Jun 08 '20 01:06 lmweber

Hi Lukas,

Sorry for late reply, I was dealing with something different in the weekend but will look into this again tomorrow again tomorrow afternoon. I think what Helena proposed makes sense, I will run it and I might send you some in between steps in case I still end up with NAs. I have a huge dataset and a lot of metadata annotations which can be challenging, but I think your pipeline is great and would be happy to make it work for this project.

Thanks again for following up, I will stay in touch tomorrow!

Best, Lena

On Sun, Jun 7, 2020, 21:48 Lukas Weber [email protected] wrote:

Hi, did the suggestions Helena provided in the CATALYST repository help?

As Helena suggested, one idea would be to also try one of the other DA methods, e.g. method_DA = "diffcyt-DA-edgeR".

If this doesn't work, I would suggest that you work through the pipeline step by step, and inspect the output objects at each point. This will help you find out where the problem is starting (e.g. something to do with your expression matrix, or metadata table, etc).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-640318775, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67UEFNLS5C7WFIMGHQDRVQ7QJANCNFSM4NQECC6A .

Jun 08 '20 02:06 LenaMayer

Hi Lukas,

sorry, it took us a few meetings to try out some things, but none of the methods worked. We think that the GLMM method does tolerate NA's in the metadata, but it is difficult to create and loop the right contrasts since it depends on the createFormula function. The other methods can bet set up with a designMatrix, but do not tolerate unassigned samples in the metadata. However, even after assignment, the code did not run through. I would like to share with you a subset of my sce and the code we tried. Maybe you can have a look? It might be obvious to you what is wrong and maybe it can be easily fixed by adapting the metadata.

Let me know if you need more information!

Thank you, Lena

sub_UMAP_CD19.rds https://drive.google.com/file/d/1F5p1TiyEH49fR_8zac8KTWCeJRPomTXC/view?usp=drive_web

Am So., 7. Juni 2020 um 22:40 Uhr schrieb Lena M. < [email protected]>:

Hi Lukas,

Sorry for late reply, I was dealing with something different in the weekend but will look into this again tomorrow again tomorrow afternoon. I think what Helena proposed makes sense, I will run it and I might send you some in between steps in case I still end up with NAs. I have a huge dataset and a lot of metadata annotations which can be challenging, but I think your pipeline is great and would be happy to make it work for this project.

Thanks again for following up, I will stay in touch tomorrow!

Best, Lena

On Sun, Jun 7, 2020, 21:48 Lukas Weber [email protected] wrote:

Hi, did the suggestions Helena provided in the CATALYST repository help?

As Helena suggested, one idea would be to also try one of the other DA methods, e.g. method_DA = "diffcyt-DA-edgeR".

If this doesn't work, I would suggest that you work through the pipeline step by step, and inspect the output objects at each point. This will help you find out where the problem is starting (e.g. something to do with your expression matrix, or metadata table, etc).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-640318775, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67UEFNLS5C7WFIMGHQDRVQ7QJANCNFSM4NQECC6A .

Jun 11 '20 18:06 LenaMayer

Ok thanks, I just clicked request access for my gmail account in the google drive link.

Jun 11 '20 23:06 lmweber

Yes, do you have access now?

Am Do., 11. Juni 2020 um 19:31 Uhr schrieb Lukas Weber < [email protected]>:

Ok thanks, I just clicked request access for my gmail account in the google drive link.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-642981037, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67UPG6EQW36F5RQ6SZDRWFSMDANCNFSM4NQECC6A .

Jun 11 '20 23:06 LenaMayer

Yes, I have access now.

Tagging the other issue you opened in the CATALYST repo so we can keep the discussion connected: https://github.com/HelenaLC/CATALYST/issues/110

Jun 11 '20 23:06 lmweber

Thank you!

Am Do., 11. Juni 2020 um 19:39 Uhr schrieb Lukas Weber < [email protected]>:

Yes, I have access now.

Tagging the other issue you opened in the CATALYST repo so we can keep the discussion connected: HelenaLC/CATALYST#110 https://github.com/HelenaLC/CATALYST/issues/110

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-642983358, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67QEMK4CO36OE27B5KLRWFTM5ANCNFSM4NQECC6A .

Jun 11 '20 23:06 LenaMayer

Ok I think I see part of the problem now. When I run md <- ei(sce) to check the metadata table (as in the code @HelenaLC provided in the other GitHub issue), it shows a huge table with lots of NAs, so I don't think this has been set up correctly.

Could you have a look at the example of a metadata table in our CyTOF workflow (e.g. see page 6), and check if this has been set up correctly here? The metadata table should describe the filenames, sample IDs, conditions, etc of the samples / .fcs files.

Also tagging @markrobinsonuzh

Jun 11 '20 23:06 lmweber

Ok, actually I did this on purpose, since working on the workflow before getting to diffcyt, I cannot have too many features, otherwise the plots will look weird. Thus I split up the columns for each LPMC and PBMC. If it helps, I can assign the columns that I don't need with an "X" and run everything again. In that case, I would also have to assign "X" to be a level of any given column, right?

Am Do., 11. Juni 2020 um 19:48 Uhr schrieb Lukas Weber < [email protected]>:

Ok I think I see part of the problem now. When I run md <- ei(sce) to check the metadata table (as in the code @HelenaLC https://github.com/HelenaLC provided in the other GitHub issue), it shows a huge table with lots of NAs, so I don't think this has been set up correctly.

Could you have a look at the example of a metadata table in our CyTOF workflow https://f1000research.com/articles/6-748 (e.g. see page 6), and check if this has been set up correctly here? The metadata table should describe the filenames, sample IDs, conditions, etc of the samples / .fcs files.

Also tagging @markrobinsonuzh https://github.com/markrobinsonuzh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-642985520, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67VYB4VBLBT4GKYPGX3RWFUMZANCNFSM4NQECC6A .

Jun 11 '20 23:06 LenaMayer

Also attaching the code for the setup so you can get a sense of the beginning of the workflow for our experiment. Ignore the code chunks in the end - that was just where we were playing around with diffcyt.

Am Do., 11. Juni 2020 um 19:48 Uhr schrieb Lukas Weber < [email protected]>:

Ok I think I see part of the problem now. When I run md <- ei(sce) to check the metadata table (as in the code @HelenaLC https://github.com/HelenaLC provided in the other GitHub issue), it shows a huge table with lots of NAs, so I don't think this has been set up correctly.

Could you have a look at the example of a metadata table in our CyTOF workflow https://f1000research.com/articles/6-748 (e.g. see page 6), and check if this has been set up correctly here? The metadata table should describe the filenames, sample IDs, conditions, etc of the samples / .fcs files.

Also tagging @markrobinsonuzh https://github.com/markrobinsonuzh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-642985520, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67VYB4VBLBT4GKYPGX3RWFUMZANCNFSM4NQECC6A .

Jun 11 '20 23:06 LenaMayer

I don't think it is going to work with this setup. The metadata table needs a column condition that describes the condition of each sample, and optional additional columns describing any other covariates, e.g. patient IDs for replicates.

Then you can use this to create a design matrix. The entries in the contrast vector should then match the columns of the design matrix.

Could you please paste a copy of your design matrix (or part of it)?

(Also note your previous message does not include any code, not sure if there was supposed to be an attachment.)

Jun 12 '20 00:06 lmweber

Is this helpful?

colnames(DesignMatrix)

[1] "(Intercept)" "Monotherapy_LPMCaTNF_LPMC"

[3] "Monotherapy_LPMCaINT_LPMC" "Monotherapy_LPMCaIL_LPMC"

[5] "Monotherapy_LPMCaJAK_LPMC" "Monotherapy_LPMC5ASA_LPMC"

[7] "Monotherapy_LPMCSteroids_LPMC" "Monotherapy_LPMC6MP_LPMC"

group <- "Monotherapy_LPMCaINT_LPMC"

as.numeric(colnames(DesignMatrix) == group) [1] 0 0 1 0 0 0 0 0

Are you sure it has to be named "condition"? I thought if you keep the labeling straight any column could be used as "condition".

I saw it was loaded to google drive. I shared the file with you, hope it works now.

Am Do., 11. Juni 2020 um 20:07 Uhr schrieb Lukas Weber < [email protected]>:

I don't think it is going to work with this setup. The metadata table needs a column condition that describes the condition of each sample, and optional additional columns describing any other covariates, e.g. patient IDs for replicates.

Then you can use this to create a design matrix. The entries in the contrast vector should then match the columns of the design matrix.

Could you please paste a copy of your design matrix (or part of it)?

(Also note your previous message does not include any code, not sure if there was supposed to be an attachment.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-642990414, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67XFHHCVX4V5D4GDS6TRWFWUPANCNFSM4NQECC6A .

Jun 12 '20 00:06 LenaMayer

Not sure what is causing the issue, but just to mention that I think you don't need to exclude any features / samples only because there are many. Most plotting functions have options to specify which features to include, and if they don't, you can always use filterSCE to subset clusters, samples, conditions etc. of interest for any given plot. As I said, I don't know what the issue really is, nor how you subset your data, but incorrect subsetting might cause some downstream issues.

Jun 13 '20 11:06 HelenaLC

Thank you. So supposedly if I assign all samples to a feature and avoid NA, it should work? Or is there a different issue?

Best, Lena

On Sat, Jun 13, 2020, 07:10 Helena L. Crowell [email protected] wrote:

Not sure what is causing the issue, but just to mention that I think you don't need to exclude any features / samples. Most plotting functions have options to specify which features to include, and if they don't, you can always use filterSCE to subset clusters, samples, conditions etc. of interest for any given plot. As I said, I don't know what the issue really is, nor how you subset your data, but incorrect subsetting might cause some downstream issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-643608385, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67QXMTE5WCRUX7TCSTLRWNNBDANCNFSM4NQECC6A .

Jun 15 '20 12:06 LenaMayer

Not sure what you mean by "assign all samples to a feature"? I don't actually know if this is causing the issue, but was suggesting that it might be easier & less error-prone to avoid manually subsetting parts of the data. You can, for example, specify which features to test in diffcyt(), and which samples/clusters/conditions to include in a given plot via creating subsets with filterSCE()- so I think there's no need to split/filter the data in advance to differential analysis.

Jun 15 '20 14:06 HelenaLC

Ok, thank you. Sorry for misunderstanding. I will try that!

On Mon, Jun 15, 2020, 10:01 Helena L. Crowell [email protected] wrote:

Not sure what you mean by "assign all samples to a feature"? I don't actually know if this is causing the issue, but was suggesting that it might be easier & less error-prone to avoid manually subsetting parts of the data. You can, for example, specify which features to test in diffcyt(), and which samples/clusters/conditions to include in a given plot via creating subsets with filterSCE()- so I think there's no need to split/filter the data in advance to differential analysis.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644153464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67SRT3DIMA6RHBSKNTDRWYSUBANCNFSM4NQECC6A .

Jun 15 '20 14:06 LenaMayer

Hi,

trying to filter out samples, filterSCE() somehow removes all levels:

sce <- readRDS("UMAP_Pheno_CD19_merged.rds") ei$Monotherapy_LPMC = fct_explicit_na(ei$Monotherapy_LPMC, "X") sce_filtered <- filterSCE(sce, Monotherapy_LPMC != "X") ei <- metadata(sce_filtered)$experiment_info levels(ei$Monotherapy_LPMC) character(0)

Any idea why that is?

Best, Lena

Am Mo., 15. Juni 2020 um 10:01 Uhr schrieb Helena L. Crowell < [email protected]>:

Not sure what you mean by "assign all samples to a feature"? I don't actually know if this is causing the issue, but was suggesting that it might be easier & less error-prone to avoid manually subsetting parts of the data. You can, for example, specify which features to test in diffcyt(), and which samples/clusters/conditions to include in a given plot via creating subsets with filterSCE()- so I think there's no need to split/filter the data in advance to differential analysis.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644153464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67SRT3DIMA6RHBSKNTDRWYSUBANCNFSM4NQECC6A .

Jun 15 '20 21:06 LenaMayer

first of all, no, not sure why all factor levels are removed, but:
you are replacing NA with X in ei
then you don't replace the metadata()$experiment_info with the new ei
so Monotherapy_LPMC != "X" doesn't make sense
alternatively, and much simpler, you could use filterSCE(sce, !is.na(Monotherapy_LPMC))

Jun 16 '20 12:06 HelenaLC

Thank you! I tried it out, but filtering is still erasing all the metadata.

dim(ei) [1] 0 68

Am Di., 16. Juni 2020 um 08:18 Uhr schrieb Helena L. Crowell < [email protected]>:

first of all, no, not sure why all factor levels are removed, but:

you are replacing NA with X in ei, but then you don't replace the metadata()$experiment_info with the new ei

so Monotherapy_LPMC != "X" doesn't make sense

alternatively, and much simpler, you could use filterSCE(sce, !is.na (Monotherapy_LPMC))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644726492, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67R53DVDMRI6VAS2JXDRW5PIDANCNFSM4NQECC6A .

Jun 16 '20 13:06 LenaMayer

Could you post the output of table(sce$sample_id, sce$Monotherapy_LPMC)?

Jun 16 '20 14:06 HelenaLC

Yes! Here is the output of the table:

table(sce$sample_id, sce$Monotherapy_LPMC)

          aTNF_LPMC aINT_LPMC aIL_LPMC aJAK_LPMC 5ASA_LPMC Steroids_LPMC

HC_DIST_002 0 0 0 0 0 0 HC_PROX_002 0 0 0 0 0 0 HC_DIST_003 0 0 0 0 0 0 CD_DIST_004 915 0 0 0 0 0 CD_PROX_004 136 0 0 0 0 0 HC_DIST_005 0 0 0 0 0 0 HC_PROX_005 0 0 0 0 0 0 HC_DIST_006 0 0 0 0 0 0 HC_PROX_006 0 0 0 0 0 0 HC_DIST_015 0 0 0 0 0 0 HC_PROX_015 0 0 0 0 0 0 HC_DIST_016 0 0 0 0 0 0 HC_PROX_016 0 0 0 0 0 0 HC_DIST_017 0 0 0 0 0 0 HC_PROX_017 0 0 0 0 0 0 HC_DIST_018 0 0 0 0 0 0 HC_PROX_018 0 0 0 0 0 0 CD_PROX_022 2825 0 0 0 0 0 HC_PROX_023 0 0 0 0 0 0 HC_DIST_026 0 0 0 0 0 0 HC_PROX_026 0 0 0 0 0 0 UC_DIST_030 11317 0 0 0 0 0 UC_PROX_030 138 0 0 0 0 0 HC_DIST_031 0 0 0 0 0 0 HC_PROX_031 0 0 0 0 0 0 UC_DIST_033 0 0 0 0 53 0 UC_PROX_033 0 0 0 0 79 0 CD_DIST_034 247 0 0 0 0 0 CD_PROX_034 4022 0 0 0 0 0 UC_PROX_036 0 0 0 0 37462 0 UC_DIST_049 134300 0 0 0 0 0 UC_PROX_049 76791 0 0 0 0 0 CD_PROX_050 0 0 0 0 112 0 HC_DIST_053 0 0 0 0 0 0 HC_PROX_053 0 0 0 0 0 0 UC_PROX_056 5701 0 0 0 0 0 CD_DIST_057 0 13868 0 0 0 0 CD_PROX_057 0 13377 0 0 0 0 UC_DIST_060 205 0 0 0 0 0 UC_PROX_060 55587 0 0 0 0 0 HC_PROX_061 0 0 0 0 0 0 CD_DIST_062 150019 0 0 0 0 0 CD_PROX_062 41894 0 0 0 0 0 HC_DIST_063 0 0 0 0 0 0 HC_PROX_063 0 0 0 0 0 0 HC_DIST_066 0 0 0 0 0 0 HC_PROX_066 0 0 0 0 0 0 CD_DIST_067 18395 0 0 0 0 0 CD_PROX_067 13454 0 0 0 0 0 UC_DIST_069 0 0 0 0 1178 0 UC_PROX_069 0 0 0 0 141 0 HC_DIST_071 0 0 0 0 0 0 HC_PROX_071 0 0 0 0 0 0 UC_DIST_072 0 590 0 0 0 0 UC_PROX_072 0 3544 0 0 0 0 HC_DIST_073 0 0 0 0 0 0 HC_PROX_073 0 0 0 0 0 0 UC_DIST_074 0 0 0 0 569 0 UC_PROX_074 0 0 0 0 6577 0 UC_DIST_075 0 880 0 0 0 0 UC_PROX_075 0 83132 0 0 0 0 CD_DIST_077 0 0 0 0 4568 0 CD_PROX_077 0 0 0 0 712 0 CD_DIST_078 172 0 0 0 0 0 CD_PROX_078 1146 0 0 0 0 0 CD_DIST_079 0 170326 0 0 0 0 CD_PROX_079 0 2716 0 0 0 0 HC_DIST_081 0 0 0 0 0 0 HC_PROX_081 0 0 0 0 0 0 CD_DIST_086 37727 0 0 0 0 0 CD_PROX_086 226 0 0 0 0 0 CD_DIST_088 0 650 0 0 0 0 CD_DIST_089 30396 0 0 0 0 0 CD_PROX_089 49646 0 0 0 0 0 UC_DIST_090 0 0 0 0 195279 0 UC_PROX_090 0 0 0 0 14487 0 CD_DIST_092 0 168 0 0 0 0 CD_PROX_092 0 62 0 0 0 0 UC_DIST_094 0 0 0 0 1106 0 UC_PROX_094 0 0 0 0 2358 0 UC_DIST_095 9783 0 0 0 0 0 UC_PROX_095 1559 0 0 0 0 0 UC_DIST_097 0 0 0 0 231 0 UC_PROX_097 0 0 0 0 4740 0 CD_DIST_098 0 0 0 0 0 45 CD_PROX_098 0 0 0 0 0 180 UC_DIST_100 0 0 0 0 126827 0 UC_PROX_100 0 0 0 0 1791 0 UC_DIST_101 0 0 0 0 119128 0 UC_PROX_101 0 0 0 0 9074 0 UC_DIST_102 0 0 0 0 1375 0 UC_PROX_102 0 0 0 0 501 0 CD_DIST_106 110 0 0 0 0 0 CD_PROX_106 36 0 0 0 0 0 CD_PROX_108 0 0 0 0 0 0 HC_DIST_109 0 0 0 0 0 0 HC_PROX_109 0 0 0 0 0 0 HC_DIST_110 0 0 0 0 0 0 HC_PROX_110 0 0 0 0 0 0 UC_DIST_111 0 0 0 0 0 81806 UC_PROX_111 0 0 0 0 0 9674 UC_DIST_112 0 359 0 0 0 0 UC_PROX_112 0 129 0 0 0 0 CD_DIST_113 134151 0 0 0 0 0 CD_PROX_113 58076 0 0 0 0 0 CD_DIST_114 763 0 0 0 0 0 CD_PROX_114 367 0 0 0 0 0 HC_DIST_115 0 0 0 0 0 0 HC_PROX_115 0 0 0 0 0 0 HC_DIST_116 0 0 0 0 0 0 HC_PROX_116 0 0 0 0 0 0 HC_DIST_117 0 0 0 0 0 0 HC_PROX_117 0 0 0 0 0 0 CD_DIST_119 0 0 0 0 748 0 CD_PROX_119 0 0 0 0 356 0 UC_DIST_120 0 0 0 0 126667 0 UC_PROX_120 0 0 0 0 128297 0 CD_DIST_122 0 0 0 0 980 0 CD_PROX_122 0 0 0 0 9506 0 HC_DIST_125 0 0 0 0 0 0 HC_PROX_125 0 0 0 0 0 0 CD_DIST_127 0 0 3985 0 0 0 CD_PROX_127 0 0 1008 0 0 0 UC_DIST_129 0 0 0 0 903 0 UC_PROX_129 0 0 0 0 154154 0 CD_DIST_130 0 0 0 0 0 0 CD_PROX_130 0 0 0 0 0 0 HC_DIST_132 0 0 0 0 0 0 HC_PROX_132 0 0 0 0 0 0 CD_DIST_133 2670 0 0 0 0 0 CD_PROX_133 358 0 0 0 0 0 HC_DIST_135 0 0 0 0 0 0 HC_PROX_135 0 0 0 0 0 0 UC_DIST_136 0 0 0 0 1090 0 UC_PROX_136 0 0 0 0 15989 0 UC_DIST_138 0 0 0 0 116822 0 UC_PROX_138 0 0 0 0 9128 0 CD_DIST_139 79261 0 0 0 0 0 CD_PROX_139 4434 0 0 0 0 0 CD_DIST_140 77 0 0 0 0 0 CD_PROX_140 908 0 0 0 0 0 CD_DIST_141 0 1001 0 0 0 0 CD_PROX_141 0 44238 0 0 0 0 UC_DIST_143 0 0 0 140923 0 0 UC_PROX_143 0 0 0 2161 0 0 CD_DIST_145 115412 0 0 0 0 0 CD_PROX_145 794 0 0 0 0 0 HC_DIST_147 0 0 0 0 0 0 HC_PROX_147 0 0 0 0 0 0 HC_DIST_151 0 0 0 0 0 0 HC_PROX_151 0 0 0 0 0 0 HC_DIST_154 0 0 0 0 0 0 HC_PROX_154 0 0 0 0 0 0 UC_DIST_159 28817 0 0 0 0 0 UC_PROX_159 33839 0 0 0 0 0

          6MP_LPMC HC_LPMC

HC_DIST_002 0 95 HC_PROX_002 0 8813 HC_DIST_003 0 2969 CD_DIST_004 0 0 CD_PROX_004 0 0 HC_DIST_005 0 25373 HC_PROX_005 0 1776 HC_DIST_006 0 48612 HC_PROX_006 0 1118 HC_DIST_015 0 1466 HC_PROX_015 0 800 HC_DIST_016 0 55778 HC_PROX_016 0 283 HC_DIST_017 0 95 HC_PROX_017 0 3677 HC_DIST_018 0 20187 HC_PROX_018 0 477 CD_PROX_022 0 0 HC_PROX_023 0 14326 HC_DIST_026 0 4422 HC_PROX_026 0 602 UC_DIST_030 0 0 UC_PROX_030 0 0 HC_DIST_031 0 494 HC_PROX_031 0 83 UC_DIST_033 0 0 UC_PROX_033 0 0 CD_DIST_034 0 0 CD_PROX_034 0 0 UC_PROX_036 0 0 UC_DIST_049 0 0 UC_PROX_049 0 0 CD_PROX_050 0 0 HC_DIST_053 0 834 HC_PROX_053 0 53558 UC_PROX_056 0 0 CD_DIST_057 0 0 CD_PROX_057 0 0 UC_DIST_060 0 0 UC_PROX_060 0 0 HC_PROX_061 0 4494 CD_DIST_062 0 0 CD_PROX_062 0 0 HC_DIST_063 0 53511 HC_PROX_063 0 818 HC_DIST_066 0 57065 HC_PROX_066 0 69044 CD_DIST_067 0 0 CD_PROX_067 0 0 UC_DIST_069 0 0 UC_PROX_069 0 0 HC_DIST_071 0 324 HC_PROX_071 0 709 UC_DIST_072 0 0 UC_PROX_072 0 0 HC_DIST_073 0 117 HC_PROX_073 0 923 UC_DIST_074 0 0 UC_PROX_074 0 0 UC_DIST_075 0 0 UC_PROX_075 0 0 CD_DIST_077 0 0 CD_PROX_077 0 0 CD_DIST_078 0 0 CD_PROX_078 0 0 CD_DIST_079 0 0 CD_PROX_079 0 0 HC_DIST_081 0 237 HC_PROX_081 0 1102 CD_DIST_086 0 0 CD_PROX_086 0 0 CD_DIST_088 0 0 CD_DIST_089 0 0 CD_PROX_089 0 0 UC_DIST_090 0 0 UC_PROX_090 0 0 CD_DIST_092 0 0 CD_PROX_092 0 0 UC_DIST_094 0 0 UC_PROX_094 0 0 UC_DIST_095 0 0 UC_PROX_095 0 0 UC_DIST_097 0 0 UC_PROX_097 0 0 CD_DIST_098 0 0 CD_PROX_098 0 0 UC_DIST_100 0 0 UC_PROX_100 0 0 UC_DIST_101 0 0 UC_PROX_101 0 0 UC_DIST_102 0 0 UC_PROX_102 0 0 CD_DIST_106 0 0 CD_PROX_106 0 0 CD_PROX_108 3645 0 HC_DIST_109 0 513 HC_PROX_109 0 848 HC_DIST_110 0 62786 HC_PROX_110 0 31366 UC_DIST_111 0 0 UC_PROX_111 0 0 UC_DIST_112 0 0 UC_PROX_112 0 0 CD_DIST_113 0 0 CD_PROX_113 0 0 CD_DIST_114 0 0 CD_PROX_114 0 0 HC_DIST_115 0 11004 HC_PROX_115 0 56556 HC_DIST_116 0 442 HC_PROX_116 0 184070 HC_DIST_117 0 13688 HC_PROX_117 0 319 CD_DIST_119 0 0 CD_PROX_119 0 0 UC_DIST_120 0 0 UC_PROX_120 0 0 CD_DIST_122 0 0 CD_PROX_122 0 0 HC_DIST_125 0 56 HC_PROX_125 0 257 CD_DIST_127 0 0 CD_PROX_127 0 0 UC_DIST_129 0 0 UC_PROX_129 0 0 CD_DIST_130 903 0 CD_PROX_130 120 0 HC_DIST_132 0 105908 HC_PROX_132 0 1356 CD_DIST_133 0 0 CD_PROX_133 0 0 HC_DIST_135 0 359 HC_PROX_135 0 2180 UC_DIST_136 0 0 UC_PROX_136 0 0 UC_DIST_138 0 0 UC_PROX_138 0 0 CD_DIST_139 0 0 CD_PROX_139 0 0 CD_DIST_140 0 0 CD_PROX_140 0 0 CD_DIST_141 0 0 CD_PROX_141 0 0 UC_DIST_143 0 0 UC_PROX_143 0 0 CD_DIST_145 0 0 CD_PROX_145 0 0 HC_DIST_147 0 220 HC_PROX_147 0 2354 HC_DIST_151 0 2419 HC_PROX_151 0 89773 HC_DIST_154 0 295 HC_PROX_154 0 101412 UC_DIST_159 0 0 UC_PROX_159 0 0

Am Di., 16. Juni 2020 um 10:41 Uhr schrieb Helena L. Crowell < [email protected]>:

Could you post the output of table(sce$sample_id, sce$Monotherapy_LPMC)?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644809100, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67VGOSUGD77O4HY6TS3RW6ABTANCNFSM4NQECC6A .

Jun 16 '20 16:06 LenaMayer

Sorry this is happening... it's quite hard to understand what's going on without the data. Could you also try something like:

sub <- sce[, !is.na(sce$Monotherapy_LPMC)]
colData(sub) <- droplevels(colData(sub))
dim(sub)
table(sub$sample_id, sub$Monotherapy_LPMC)

...I'm trying to understand what the filtering is removing.

Jun 16 '20 17:06 HelenaLC

No worries. Let me know if you would like a subset of the sce. I think I shared it with Lukas, but can also share it with you.

sub <- sce[, !is.na(sce$Monotherapy_LPMC)] colData(sub) <- droplevels(colData(sub)) dim(sub) [1] 39 3881515 table(sub$sample_id, sub$Monotherapy_LPMC)

          aTNF_LPMC aINT_LPMC aIL_LPMC aJAK_LPMC 5ASA_LPMC Steroids_LPMC

HC_DIST_002 0 0 0 0 0 0 HC_PROX_002 0 0 0 0 0 0 HC_DIST_003 0 0 0 0 0 0 CD_DIST_004 915 0 0 0 0 0 CD_PROX_004 136 0 0 0 0 0 HC_DIST_005 0 0 0 0 0 0 HC_PROX_005 0 0 0 0 0 0 HC_DIST_006 0 0 0 0 0 0 HC_PROX_006 0 0 0 0 0 0 HC_DIST_015 0 0 0 0 0 0 HC_PROX_015 0 0 0 0 0 0 HC_DIST_016 0 0 0 0 0 0 HC_PROX_016 0 0 0 0 0 0 HC_DIST_017 0 0 0 0 0 0 HC_PROX_017 0 0 0 0 0 0 HC_DIST_018 0 0 0 0 0 0 HC_PROX_018 0 0 0 0 0 0 CD_PROX_022 2825 0 0 0 0 0 HC_PROX_023 0 0 0 0 0 0 HC_DIST_026 0 0 0 0 0 0 HC_PROX_026 0 0 0 0 0 0 UC_DIST_030 11317 0 0 0 0 0 UC_PROX_030 138 0 0 0 0 0 HC_DIST_031 0 0 0 0 0 0 HC_PROX_031 0 0 0 0 0 0 UC_DIST_033 0 0 0 0 53 0 UC_PROX_033 0 0 0 0 79 0 CD_DIST_034 247 0 0 0 0 0 CD_PROX_034 4022 0 0 0 0 0 UC_PROX_036 0 0 0 0 37462 0 UC_DIST_049 134300 0 0 0 0 0 UC_PROX_049 76791 0 0 0 0 0 CD_PROX_050 0 0 0 0 112 0 HC_DIST_053 0 0 0 0 0 0 HC_PROX_053 0 0 0 0 0 0 UC_PROX_056 5701 0 0 0 0 0 CD_DIST_057 0 13868 0 0 0 0 CD_PROX_057 0 13377 0 0 0 0 UC_DIST_060 205 0 0 0 0 0 UC_PROX_060 55587 0 0 0 0 0 HC_PROX_061 0 0 0 0 0 0 CD_DIST_062 150019 0 0 0 0 0 CD_PROX_062 41894 0 0 0 0 0 HC_DIST_063 0 0 0 0 0 0 HC_PROX_063 0 0 0 0 0 0 HC_DIST_066 0 0 0 0 0 0 HC_PROX_066 0 0 0 0 0 0 CD_DIST_067 18395 0 0 0 0 0 CD_PROX_067 13454 0 0 0 0 0 UC_DIST_069 0 0 0 0 1178 0 UC_PROX_069 0 0 0 0 141 0 HC_DIST_071 0 0 0 0 0 0 HC_PROX_071 0 0 0 0 0 0 UC_DIST_072 0 590 0 0 0 0 UC_PROX_072 0 3544 0 0 0 0 HC_DIST_073 0 0 0 0 0 0 HC_PROX_073 0 0 0 0 0 0 UC_DIST_074 0 0 0 0 569 0 UC_PROX_074 0 0 0 0 6577 0 UC_DIST_075 0 880 0 0 0 0 UC_PROX_075 0 83132 0 0 0 0 CD_DIST_077 0 0 0 0 4568 0 CD_PROX_077 0 0 0 0 712 0 CD_DIST_078 172 0 0 0 0 0 CD_PROX_078 1146 0 0 0 0 0 CD_DIST_079 0 170326 0 0 0 0 CD_PROX_079 0 2716 0 0 0 0 HC_DIST_081 0 0 0 0 0 0 HC_PROX_081 0 0 0 0 0 0 CD_DIST_086 37727 0 0 0 0 0 CD_PROX_086 226 0 0 0 0 0 CD_DIST_088 0 650 0 0 0 0 CD_DIST_089 30396 0 0 0 0 0 CD_PROX_089 49646 0 0 0 0 0 UC_DIST_090 0 0 0 0 195279 0 UC_PROX_090 0 0 0 0 14487 0 CD_DIST_092 0 168 0 0 0 0 CD_PROX_092 0 62 0 0 0 0 UC_DIST_094 0 0 0 0 1106 0 UC_PROX_094 0 0 0 0 2358 0 UC_DIST_095 9783 0 0 0 0 0 UC_PROX_095 1559 0 0 0 0 0 UC_DIST_097 0 0 0 0 231 0 UC_PROX_097 0 0 0 0 4740 0 CD_DIST_098 0 0 0 0 0 45 CD_PROX_098 0 0 0 0 0 180 UC_DIST_100 0 0 0 0 126827 0 UC_PROX_100 0 0 0 0 1791 0 UC_DIST_101 0 0 0 0 119128 0 UC_PROX_101 0 0 0 0 9074 0 UC_DIST_102 0 0 0 0 1375 0 UC_PROX_102 0 0 0 0 501 0 CD_DIST_106 110 0 0 0 0 0 CD_PROX_106 36 0 0 0 0 0 CD_PROX_108 0 0 0 0 0 0 HC_DIST_109 0 0 0 0 0 0 HC_PROX_109 0 0 0 0 0 0 HC_DIST_110 0 0 0 0 0 0 HC_PROX_110 0 0 0 0 0 0 UC_DIST_111 0 0 0 0 0 81806 UC_PROX_111 0 0 0 0 0 9674 UC_DIST_112 0 359 0 0 0 0 UC_PROX_112 0 129 0 0 0 0 CD_DIST_113 134151 0 0 0 0 0 CD_PROX_113 58076 0 0 0 0 0 CD_DIST_114 763 0 0 0 0 0 CD_PROX_114 367 0 0 0 0 0 HC_DIST_115 0 0 0 0 0 0 HC_PROX_115 0 0 0 0 0 0 HC_DIST_116 0 0 0 0 0 0 HC_PROX_116 0 0 0 0 0 0 HC_DIST_117 0 0 0 0 0 0 HC_PROX_117 0 0 0 0 0 0 CD_DIST_119 0 0 0 0 748 0 CD_PROX_119 0 0 0 0 356 0 UC_DIST_120 0 0 0 0 126667 0 UC_PROX_120 0 0 0 0 128297 0 CD_DIST_122 0 0 0 0 980 0 CD_PROX_122 0 0 0 0 9506 0 HC_DIST_125 0 0 0 0 0 0 HC_PROX_125 0 0 0 0 0 0 CD_DIST_127 0 0 3985 0 0 0 CD_PROX_127 0 0 1008 0 0 0 UC_DIST_129 0 0 0 0 903 0 UC_PROX_129 0 0 0 0 154154 0 CD_DIST_130 0 0 0 0 0 0 CD_PROX_130 0 0 0 0 0 0 HC_DIST_132 0 0 0 0 0 0 HC_PROX_132 0 0 0 0 0 0 CD_DIST_133 2670 0 0 0 0 0 CD_PROX_133 358 0 0 0 0 0 HC_DIST_135 0 0 0 0 0 0 HC_PROX_135 0 0 0 0 0 0 UC_DIST_136 0 0 0 0 1090 0 UC_PROX_136 0 0 0 0 15989 0 UC_DIST_138 0 0 0 0 116822 0 UC_PROX_138 0 0 0 0 9128 0 CD_DIST_139 79261 0 0 0 0 0 CD_PROX_139 4434 0 0 0 0 0 CD_DIST_140 77 0 0 0 0 0 CD_PROX_140 908 0 0 0 0 0 CD_DIST_141 0 1001 0 0 0 0 CD_PROX_141 0 44238 0 0 0 0 UC_DIST_143 0 0 0 140923 0 0 UC_PROX_143 0 0 0 2161 0 0 CD_DIST_145 115412 0 0 0 0 0 CD_PROX_145 794 0 0 0 0 0 HC_DIST_147 0 0 0 0 0 0 HC_PROX_147 0 0 0 0 0 0 HC_DIST_151 0 0 0 0 0 0 HC_PROX_151 0 0 0 0 0 0 HC_DIST_154 0 0 0 0 0 0 HC_PROX_154 0 0 0 0 0 0 UC_DIST_159 28817 0 0 0 0 0 UC_PROX_159 33839 0 0 0 0 0

          6MP_LPMC HC_LPMC

HC_DIST_002 0 95 HC_PROX_002 0 8813 HC_DIST_003 0 2969 CD_DIST_004 0 0 CD_PROX_004 0 0 HC_DIST_005 0 25373 HC_PROX_005 0 1776 HC_DIST_006 0 48612 HC_PROX_006 0 1118 HC_DIST_015 0 1466 HC_PROX_015 0 800 HC_DIST_016 0 55778 HC_PROX_016 0 283 HC_DIST_017 0 95 HC_PROX_017 0 3677 HC_DIST_018 0 20187 HC_PROX_018 0 477 CD_PROX_022 0 0 HC_PROX_023 0 14326 HC_DIST_026 0 4422 HC_PROX_026 0 602 UC_DIST_030 0 0 UC_PROX_030 0 0 HC_DIST_031 0 494 HC_PROX_031 0 83 UC_DIST_033 0 0 UC_PROX_033 0 0 CD_DIST_034 0 0 CD_PROX_034 0 0 UC_PROX_036 0 0 UC_DIST_049 0 0 UC_PROX_049 0 0 CD_PROX_050 0 0 HC_DIST_053 0 834 HC_PROX_053 0 53558 UC_PROX_056 0 0 CD_DIST_057 0 0 CD_PROX_057 0 0 UC_DIST_060 0 0 UC_PROX_060 0 0 HC_PROX_061 0 4494 CD_DIST_062 0 0 CD_PROX_062 0 0 HC_DIST_063 0 53511 HC_PROX_063 0 818 HC_DIST_066 0 57065 HC_PROX_066 0 69044 CD_DIST_067 0 0 CD_PROX_067 0 0 UC_DIST_069 0 0 UC_PROX_069 0 0 HC_DIST_071 0 324 HC_PROX_071 0 709 UC_DIST_072 0 0 UC_PROX_072 0 0 HC_DIST_073 0 117 HC_PROX_073 0 923 UC_DIST_074 0 0 UC_PROX_074 0 0 UC_DIST_075 0 0 UC_PROX_075 0 0 CD_DIST_077 0 0 CD_PROX_077 0 0 CD_DIST_078 0 0 CD_PROX_078 0 0 CD_DIST_079 0 0 CD_PROX_079 0 0 HC_DIST_081 0 237 HC_PROX_081 0 1102 CD_DIST_086 0 0 CD_PROX_086 0 0 CD_DIST_088 0 0 CD_DIST_089 0 0 CD_PROX_089 0 0 UC_DIST_090 0 0 UC_PROX_090 0 0 CD_DIST_092 0 0 CD_PROX_092 0 0 UC_DIST_094 0 0 UC_PROX_094 0 0 UC_DIST_095 0 0 UC_PROX_095 0 0 UC_DIST_097 0 0 UC_PROX_097 0 0 CD_DIST_098 0 0 CD_PROX_098 0 0 UC_DIST_100 0 0 UC_PROX_100 0 0 UC_DIST_101 0 0 UC_PROX_101 0 0 UC_DIST_102 0 0 UC_PROX_102 0 0 CD_DIST_106 0 0 CD_PROX_106 0 0 CD_PROX_108 3645 0 HC_DIST_109 0 513 HC_PROX_109 0 848 HC_DIST_110 0 62786 HC_PROX_110 0 31366 UC_DIST_111 0 0 UC_PROX_111 0 0 UC_DIST_112 0 0 UC_PROX_112 0 0 CD_DIST_113 0 0 CD_PROX_113 0 0 CD_DIST_114 0 0 CD_PROX_114 0 0 HC_DIST_115 0 11004 HC_PROX_115 0 56556 HC_DIST_116 0 442 HC_PROX_116 0 184070 HC_DIST_117 0 13688 HC_PROX_117 0 319 CD_DIST_119 0 0 CD_PROX_119 0 0 UC_DIST_120 0 0 UC_PROX_120 0 0 CD_DIST_122 0 0 CD_PROX_122 0 0 HC_DIST_125 0 56 HC_PROX_125 0 257 CD_DIST_127 0 0 CD_PROX_127 0 0 UC_DIST_129 0 0 UC_PROX_129 0 0 CD_DIST_130 903 0 CD_PROX_130 120 0 HC_DIST_132 0 105908 HC_PROX_132 0 1356 CD_DIST_133 0 0 CD_PROX_133 0 0 HC_DIST_135 0 359 HC_PROX_135 0 2180 UC_DIST_136 0 0 UC_PROX_136 0 0 UC_DIST_138 0 0 UC_PROX_138 0 0 CD_DIST_139 0 0 CD_PROX_139 0 0 CD_DIST_140 0 0 CD_PROX_140 0 0 CD_DIST_141 0 0 CD_PROX_141 0 0 UC_DIST_143 0 0 UC_PROX_143 0 0 CD_DIST_145 0 0 CD_PROX_145 0 0 HC_DIST_147 0 220 HC_PROX_147 0 2354 HC_DIST_151 0 2419 HC_PROX_151 0 89773 HC_DIST_154 0 295 HC_PROX_154 0 101412 UC_DIST_159 0 0 UC_PROX_159 0 0

Am Di., 16. Juni 2020 um 13:26 Uhr schrieb Helena L. Crowell < [email protected]>:

Sorry this is happening... it's quite hard to understand what's going on without the data. Could you also try something like:

sub <- sce[, !is.na(sce$Monotherapy_LPMC)] colData(sub) <- droplevels(colData(sub)) dim(sub) table(sub$sample_id, sub$Monotherapy_LPMC)

...I'm trying to understand what the filtering is removing.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644903560, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67RGUAUMYNUFHNWYKDDRW6TNBANCNFSM4NQECC6A .

Jun 16 '20 17:06 LenaMayer

Yes, that would certainly help, thank you! Not just to solve your issue, but to fix any bugs in filterSCE().

Jun 16 '20 17:06 HelenaLC

Let me know if you can access the link and if you need anything else! Thanks for trouble shooting! sub_UMAP_CD19.rds https://drive.google.com/file/d/1UcvyGxQgkNiAlVNDsoUyBAR1UsRfvr-w/view?usp=drive_web

Am Di., 16. Juni 2020 um 13:40 Uhr schrieb Helena L. Crowell < [email protected]>:

Yes, that would certainly help, thank you! Not just to solve your issue, but to fix any bugs in filterSCE().

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-644910151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67UZLN3OYX3MFBG3NR3RW6VBBANCNFSM4NQECC6A .

Jun 16 '20 17:06 LenaMayer

Hi Lena, I had another look at this now.

I do think the issue is that the metadata / experiment_info hasn't been set up correctly. This needs to be a table where each row is a sample, and the columns are things like sample IDs, condition, patient IDs, etc. The way it is currently set up, there are lots of additional columns full of NAs, which doesn't look right. Most of the columns should be factors (i.e. not using dummy-coding or similar).

For example, the metadata table could look like the following, which I have done by manipulating the object you sent.

> head(md)
    sample_id condition
1 HC_DIST_002   HC_LPMC
2 HC_PROX_002   HC_LPMC
3 HC_DIST_003   HC_LPMC
4 CD_DIST_004 aTNF_LPMC
5 CD_PROX_004 aTNF_LPMC
6 HC_DIST_005   HC_LPMC
> dim(md)
[1] 155   2
> str(md)
'data.frame':	155 obs. of  2 variables:
 $ sample_id: Factor w/ 155 levels "HC_DIST_002",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ condition: Factor w/ 8 levels "HC_LPMC","aTNF_LPMC",..: 1 1 1 2 2 1 1 1 1 1 ...

Here you can see that md is a table with 155 rows (samples) and 2 columns (factor containing sample IDs, factor containing conditions). The exact setup and any additional columns will depend on your experimental design.

Then you can create the design matrix (table where rows are samples and columns are model coefficients) and contrast (vector specifying combination of model coefficients to test whether they are equal to zero), for example:

design <- createDesignMatrix(md, cols_design = "condition")
contrast <- createContrast(c(0, 1, 0, 0, 0, 0, 0, 0))

If you are doing this all with the SCE object and CATALYST, then the md should be in the experiment_info slot, as in Helena's code in the other issue.

Does this help? Let me know if any further questions. (Also PS I still don't see your code anywhere in google drive, sorry if I missed it.)

Jun 16 '20 23:06 lmweber

Hi Lukas,

thank you for looking into this! I thought I had assigned every column to be a factor when creating the sce. I will try to attach the code again, including the experiment setup. My metadata contains info for 2 different tissue types, PBMC and LPMC. Thus, in the LPMC columns, the PBMC samples are "NA", and the other way around. There should be 68 columns for 155 samples. If the metadata setup is the problem, then it should be easy to solve. Maybe you can have a look at the very beginning of the code if it is done right or if anything is missing? For example, maybe I have to assign the NA samples? Let me know if you still have trouble opening the code.

Thanks so much again!

Lena

Am Di., 16. Juni 2020 um 19:49 Uhr schrieb Lukas Weber < [email protected]>:

Hi Lena, I had another look at this now.

I do think the issue is that the metadata / experiment_info hasn't been set up correctly. This needs to be a table where each row is a sample, and the columns are things like sample IDs, condition, patient IDs, etc. The way it is currently set up, there are lots of additional columns full of NAs, which doesn't look right. Most of the columns should be factors (i.e. not using dummy-coding or similar).

For example, the metadata table could look like the following, which I have done by manipulating the object you sent.

head(md) sample_id condition 1 HC_DIST_002 HC_LPMC 2 HC_PROX_002 HC_LPMC 3 HC_DIST_003 HC_LPMC 4 CD_DIST_004 aTNF_LPMC 5 CD_PROX_004 aTNF_LPMC 6 HC_DIST_005 HC_LPMC dim(md) [1] 155 2 str(md) 'data.frame': 155 obs. of 2 variables: $ sample_id: Factor w/ 155 levels "HC_DIST_002",..: 1 2 3 4 5 6 7 8 9 10 ... $ condition: Factor w/ 8 levels "HC_LPMC","aTNF_LPMC",..: 1 1 1 2 2 1 1 1 1 1 ...

Here you can see that md is a table with 155 rows (samples) and 2 columns (factor containing sample IDs, factor containing conditions). The exact setup and any additional columns will depend on your experimental design.

Then you can create the design matrix (table where rows are samples and columns are model coefficients) and contrast (vector specifying combination of model coefficients to test whether they are equal to zero), for example:

design <- createDesignMatrix(md, cols_design = "condition") contrast <- createContrast(c(0, 1, 0, 0, 0, 0, 0, 0))

If you are doing this all with the SCE object and CATALYST, then the md should be in the experiment_info slot, as in Helena's code in the other issue.

Does this help? Let me know if any further questions. (Also PS I still don't see your code anywhere in google drive, sorry if I missed it.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-645067268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67XLPFDDO4W63VN4UV3RXAAG5ANCNFSM4NQECC6A .

Jun 17 '20 01:06 LenaMayer

Okay, I had a look at sub_UMAP_CD19.rds and agree with @lmweber that the metadata is off. Here's why:

All columns ending with _PBMC or _LPMC are redundant. A sample is either PBMC or LPMC. Accordingly, e.g. the Category can only be IBD/HC_PBMC for PBMC samples or IBD/HC_LPMC for LPMC samples. Having 2 columns for this (Category_PBMC/LPMC) is unnecessary, and likely the cause for your downstream issues.
I suggest adding a column Tissue that is either PBMC or LPMC. To then split the data by sample type, you'd use pbmc <- filterSCE(sce, Tissue == "PBMC").
Then, merge all redundant columns into 1. E.g., Category_PBMC/LPMC should become Category, and can take values IBD or HC only.

This should bring your metadata down to about half of the current columns. Your current metadata is not in line with how it is intended to be constructed. And, as I said, half of it's columns are redundant resulting in ~50% of NAs. With a properly designed metadata, there will be NAs only if the information is TRULY missing (e.g., it is truly unknown if a sample is a smoker or not). Otherwise, there shouldn't be any NAs.

Jun 17 '20 11:06 HelenaLC

Thank you! I will adapt accordingly!

Am Mi., 17. Juni 2020 um 07:59 Uhr schrieb Helena L. Crowell < [email protected]>:

Okay, I had a look at sub_UMAP_CD19.rds and agree with @lmweber https://github.com/lmweber that the metadata is off. Here's why:

All columns ending with _PBMC or _LPMC are redundant. A sample is either PBMC or LPMC. Accordingly, e.g. the Category can only be IBD/HC_PBMC for PBMC samples or IBD/HC_LPMC for LPMC samples. Having 2 columns for this (Category_PBMC/LPMC) is unnecessary, and possible the cause for your downstream issues.

I suggest adding a column SampleType that is either PBMC or LPMC. To then split the data by sample type, you'd use pbmc <- filterSCE(sce, SampleType == "PBMC").

Then, merge all redundant columns into 1. E.g., Category_PBMC/LPMC should become Category, and can take values IBD or HC only.

This should bring your metadata down to about half of the current columns. Your current metadata is not in line with how it is intended to be constructed. And, as I said, half of it's columns are redundant resulting in ~50% of NAs. With a properly designed metadata, there will be NAs only if the information is TRULY missing (e.g., it is truly unknown if a sample is a smoker or not). Otherwise, there shouldn't be any NAs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lmweber/diffcyt/issues/16#issuecomment-645330534, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYV67XNHKHWG73SR42ERDLRXCV2JANCNFSM4NQECC6A .

Jun 17 '20 12:06 LenaMayer

diffcyt diffcyt copied to clipboard

Compare only 2 out of many levels

diffcyt
diffcyt copied to clipboard