scDRS
scDRS copied to clipboard
Identical assoc_mcp in results
Hi @martinjzhang ,
I have used scDRS (great tool, thank you!) to score a single cell dataset against a list of genes related to a disease (obtained from GWAS sum stats). I am a bit puzzled with the result for the cell type association:
| `group | n_cell | n_ctrl | assoc_mcp | assoc_mcz | hetero_mcp | hetero_mcz | n_fdr_0.05 | n_fdr_0.1 | n_fdr_0.2 |
|---|---|---|---|---|---|---|---|---|---|
| A | 8938 | 200 | 0.0049751243 | 9.102756 | 0.0049751243 | 5.8678365 | 2800 | 3027 | 3296 |
| B | 1432 | 200 | 0.0049751243 | 4.5017133 | 0.0099502485 | 3.6684802 | 278 | 312 | 354 |
| C | 214 | 200 | 0.0049751243 | 11.462115 | 0.0049751243 | 7.4737678 | 139 | 151 | 159 |
| D | 142 | 200 | 0.0049751243 | 27.9078 | 0.24875621 | 0.7671775 | 142 | 142 | 142 |
| E | 373 | 200 | 0.0049751243 | 26.924177 | 0.0049751243 | 4.5101657 | 373 | 373 | 373 |
| F | 316 | 200 | 0.0049751243 | 29.63548 | 0.0049751243 | 3.9916444 | 316 | 316 | 316 |
| G | 169 | 200 | 0.0049751243 | 30.313704 | 0.024875622 | 2.6987221 | 169 | 169 | 169 |
| H | 228 | 200 | 0.0049751243 | 28.936302 | 0.029850746 | 2.2715647 | 228 | 228 | 228 |
| I | 324 | 200 | 0.0049751243 | 30.983482 | 0.0049751243 | 4.5710645 | 322 | 322 | 322 |
| J | 301 | 200 | 0.0049751243 | 29.881008 | 0.049751244 | 2.0174477 | 300 | 300 | 300 |
| K | 232 | 200 | 0.0049751243 | 25.27538 | 0.07960199 | 1.5823182 | 232 | 232 | 232 |
| L | 372 | 200 | 0.0049751243 | 25.981464 | 0.0099502485 | 2.542919 | 369 | 369 | 369 |
| M | 282 | 200 | 0.0049751243 | 30.938341 | 0.029850746 | 2.1587946 | 282 | 282 | 282 |
| N | 173 | 200 | 0.0049751243 | 34.117924 | 0.06965174 | 1.4641023 | 173 | 173 | 173 |
| O | 88 | 200 | 0.0049751243 | 36.643044 | 0.024875622 | 1.9150562 | 88 | 88 | 88 |
| P | 259 | 200 | 0.0049751243 | 41.091362 | 0.014925373 | 2.2539406 | 259 | 259 | 259 |
| Q | 273 | 200 | 0.0049751243 | 34.183285 | 0.0099502485 | 2.7920082 | 273 | 273 | 273 |
| R | 263 | 200 | 0.0049751243 | 38.294342 | 0.0099502485 | 2.379525 | 257 | 258 | 258 |
| S | 454 | 200 | 0.0049751243 | 40.438503 | 0.0049751243 | 2.7016406 | 452 | 452 | 452 |
| T | 179 | 200 | 0.0049751243 | 35.299545 | 0.2636816 | 0.75084305 | 172 | 172 | 172 |
| U | 627 | 200 | 0.0049751243 | 38.881256 | 0.0049751243 | 5.3322315 | 627 | 627 | 627 |
| V | 165 | 200 | 0.0049751243 | 40.369034 | 0.11442786 | 1.2370316 | 159 | 160 | 160 |
| W | 89 | 200 | 0.0049751243 | 34.50571 | 0.094527364 | 1.3114314 | 89 | 89 | 89 |
| X | 304 | 200 | 0.0049751243 | 34.702892 | 0.11442786 | 1.3594794 | 304 | 304 | 304 |
| Y | 260 | 200 | 0.0049751243 | 33.966785 | 0.0049751243 | 2.3594382 | 259 | 259 | 259 |
| Z | 187 | 200 | 0.0049751243 | 32.84846 | 0.014925373 | 2.4981256 | 181 | 182 | 183 |
| AA | 143 | 200 | 0.0049751243 | 28.68645 | 0.014925373 | 2.4380805 | 139 | 139 | 139 |
| AB | 359 | 200 | 0.0049751243 | 38.488644 | 0.5920398 | -0.046441507 | 358 | 358 | 358 |
| AC | 136 | 200 | 0.0049751243 | 31.9772 | 0.19900498 | 0.90170306 | 134 | 134 | 134 |
| AD | 293 | 200 | 0.0049751243 | 35.52879 | 0.0099502485 | 2.445356 | 292 | 292 | 293 |
| AE | 438 | 200 | 0.0049751243 | 32.291782 | 0.0099502485 | 2.7945273 | 437 | 437 | 437 |
| AF | 321 | 200 | 0.0049751243 | 35.203712 | 0.04477612 | 1.8123521 | 321 | 321 | 321 |
| AG | 143 | 200 | 0.0049751243 | 37.025703 | 0.03482587 | 1.7594645 | 143 | 143 | 143 |
| AH | 201 | 200 | 0.0049751243 | 39.599396 | 0.0049751243 | 3.6155078 | 201 | 201 | 201 |
| AI | 198 | 200 | 0.0049751243 | 39.15733 | 0.2885572 | 0.5785125 | 198 | 198 | 198 |
| AJ | 122 | 200 | 0.0049751243 | 38.167957 | 0.11442786 | 1.1606476 | 120 | 120 | 120 |
| AK | 1025 | 200 | 0.0049751243 | 33.04211 | 0.03482587 | 2.1185117 | 1021 | 1023 | 1023 |
| AL | 488 | 200 | 0.0049751243 | 6.744684 | 0.0049751243 | 4.9473066 | 306 | 334 | 359 |
| AM | 1953 | 200 | 0.0049751243 | 7.2396345 | 0.0049751243 | 3.588506 | 637 | 696 | 769 |
| AN | 15425 | 200 | 0.05472637 | 1.7903067 | 0.0049751243 | 11.273278 | 915 | 1156 | 1536 |
| AO | 1614 | 200 | 0.0049751243 | 14.856123 | 0.0049751243 | 8.108011 | 1581 | 1586 | 1595 |
| AP | 4768 | 200 | 0.0049751243 | 20.928226 | 0.0049751243 | 11.643588 | 4613 | 4633 | 4644 |
| AQ | 66266 | 200 | 0.0049751243 | 32.22967 | 0.0049751243 | 14.024827 | 15426 | 16408 | 17676 |
| AR | 607 | 200 | 0.0049751243 | 8.526978 | 0.0049751243 | 3.1208818 | 257 | 289 | 311 |
| AS | 921 | 200 | 0.0049751243 | 11.970217 | 0.0049751243 | 5.740329 | 662 | 688 | 712` |
Whats the reason behind getting all (except one) cell types an assoc_mcp equal to 0.004975? Also I read you recommend looking at the mcz scores...sorry for the naive question but is it the higher mcz score the better?
Thank you!
Cheers
it means your observed test statistic is more extreme than all control statistics (for scDRS group-level analyses), indicating your cell type-disease associations are very significant. I note that 0.0049751243 = 1 / 201 I am a little surprised that all (but 1) cell types are associated. The software runs ok, but probably you want to check if you done a proper QC...
I feared something like that... Anything I could have missed beyond log-normalization, filtering to remove 0 count genes? I followed even your processing script so any suggestions more than welcome. I must add that all the celltypes include glia and neuron subtype cells (edited for clarification).
It's common for scDRS to associate most neuronal cells to a brain disease like SCZ. Maybe looking at the different association strength across cell types will tell you something. Also, increase n_ctrl (from 200) to 1000 can improve calibration.
Thank you for the suggestion, I will run it right away with 1000 n_ctrl. How would you evaluate the "different association strength"?
Higher norm disease scores --> stronger association. Maybe you can do a UMAP and look at which regions are more strongly associated?
Yea, sure, I did that. I can find some subtypes with clearly higher norm disease scores so I would delve into those. Also regarding the scores in the group-level analyses, mc z-scores are "better" than mc p values, right? Now, what is not clear to me is whether the higher the "better" or the lower the "better"? Thanks a lot for your help Martin!
mc z-scores are "better" than mc p values in the sense that they can better distinguish disease association strength when the p-values are saturated.
higher z-score --> lower p-value.
p = 1 - CDF(z), roughly , where CDF is normal CDF.
Saturated p-values which is the case I find for most of my celltypes in the first table I posted, right?
Yes, that your mc_p = 1 / (1+n_ctrl)
Hi @martinjzhang I am getting a similar result, with my lowest assoc_mcp being 0.000999001 for many of my cell type associations. From this thread I understand that first I should ensure that I have properly done QC to my scRNA seq data. I want to confirm that it was appropriate for me to use my raw single cell data in both the compute score and downstream analyses, as long as in both of these steps I set both --flag_filter True and --flag_raw_count True? Or should I have done additional preprocessing?
Alternatively may I have created way too large of a single cell dataset? I am trying to do a body wide analysis so I ended up with an AnnData object with n_obs × n_vars = 771060 × 79209. Maybe I should be trying to scale this back?
raw single cell data in both the compute score and downstream analyses
Yes, you are supposed to use the raw count data
If assoc_mcz are different, the results should be considered as normal.
AnnData object with n_obs × n_vars = 771060 × 79209
@NaomiHuntley how did you end up with 79209 genes? Way more than any "normal" scRNAseq study I have ever seen.