pertpy
pertpy copied to clipboard
scCODA convergence issue for continous covairables
Hey, I'm experiencing issues with scCODA when analyzing the effect of a continuous covariate (like age for ex ) as opposed to a categorical ones . using continuous values yields nonsensical results, showing a credible effect on all cell types and by looking at the extended summary(below) the high-density interval for each parameter has zero width (Markov chain getting stuck.??)
while results from using age as categorical with CODA , or comparing with linear regression while using age as categorical or continous age covariate, all the output from those match perfectly (scCODA categorical ,or lm continour or categorical ). its just using any continuous covariate in scCODA , i tried differnt covarites too and still same issue when its continous.
I don't mind sharing the data too by email if you would like
Thanks a lot in advance
Version information
anndata 0.10.5.post1 cell2cell 0.7.3 decoupler 1.6.0 matplotlib 3.8.3 numpy 1.26.4 pandas 2.2.1 pertpy 0.6.0 plotnine 0.13.0 pydeseq2 0.4.7 scanpy 1.9.8 scikit_posthocs 0.9.0 seaborn 0.13.2 session_info 1.0.0
Python 3.9.18 (main, Aug 28 2023, 06:39:39) [GCC 9.2.0] Linux-4.18.0-477.43.1.el8_8.x86_64-x86_64-with-glibc2.28
Session information updated at 2024-05-16 17:58
As already discussed in the original scCODA repo (https://github.com/theislab/scCODA/issues/96), I think this is either an issue with data types/covariate normalization or a bug in the code. Unfortunately, I don't have the time atm to take an extended look at this. Could maybe someone else take this over?
@johannesostner can't make promises yet, but maybe after the preprint I have a few people that can have a stab at this. I'll put it into our backlog.
hi , i was wondering if there is any updates on this?
i shared couple of month ago the data but never got any answer back.
We're looking into this. Please remain patient.
Thanks!
have a nice day
@Marwansha whom did you sent the data again? @Lilly-May would look into this now but would need the dataset ideally, please.
Hi, she can send me by email and I will share it , and around 2 month ago i shared it with "Xichen Wu <[email protected]"
this is my email :[email protected]
Thanks
From: Lukas Heumos @.> Sent: Sunday, July 28, 2024 9:35:36 PM To: scverse/pertpy @.> Cc: Marwan @.>; Mention @.> Subject: Re: [scverse/pertpy] scCODA convergence issue for continous covairables (Issue #597)
@Marwanshahttps://urldefense.com/v3/__https://github.com/Marwansha__;!!JFdNOqOXpB6UZW0!opQUTLGvfeTLaDnIOHew1jeXBB3eTwHJkepSv30HalueQWrdoGT9_obzORGJ-ftovCzd6s8L-nwjUfd_gQtC4cWkYVPSuus$ whom did you sent the data again? @Lilly-Mayhttps://urldefense.com/v3/__https://github.com/Lilly-May__;!!JFdNOqOXpB6UZW0!opQUTLGvfeTLaDnIOHew1jeXBB3eTwHJkepSv30HalueQWrdoGT9_obzORGJ-ftovCzd6s8L-nwjUfd_gQtC4cWkOebNaqA$ would look into this now but would need the dataset ideally, please.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/scverse/pertpy/issues/597*issuecomment-2254622078__;Iw!!JFdNOqOXpB6UZW0!opQUTLGvfeTLaDnIOHew1jeXBB3eTwHJkepSv30HalueQWrdoGT9_obzORGJ-ftovCzd6s8L-nwjUfd_gQtC4cWk6NMOySk$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/BCN7633DZXGIZHT454X3SFLZOVBYRAVCNFSM6AAAAABH2QMBQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGYZDEMBXHA__;!!JFdNOqOXpB6UZW0!opQUTLGvfeTLaDnIOHew1jeXBB3eTwHJkepSv30HalueQWrdoGT9_obzORGJ-ftovCzd6s8L-nwjUfd_gQtC4cWk-KM4-6k$. You are receiving this because you were mentioned.Message ID: @.***>
Hi @Marwansha! As discussed via email, I’ve looked into your issue. I was able to reproduce it using your data. However, the problem was resolved for me when I normalized the continuous covariate.
For example, I tested this with age. As you mentioned, using age groups as covariate works, but using age as a continuous covariate directly does not. However, applying simple min-max normalization resolved the issue. You can see this in the notebook here.
If you have time, please check if normalizing the features resolves the problem for you as well and report back here. I’m happy to look into this further if you’re still facing issues.
Hi,
Thanks alot for taking the time to check this,
I checked the notebook and I understood the issue, I will try to reproduce it and let you know if it work.
I just have 1 question to confirm I understand exactly.
1)The normalisation step is only needed for continuous variables? Since the model worked just fine for the categorical values when ran on raw counts.?
- doesn't scCODA do a step of normalisation of the counts?
Thanks alot tho, will let you know soon
Best Marwan
Sent from Outlook for Androidhttps://aka.ms/AAb9ysg
From: Lilly May @.> Sent: Wednesday, July 31, 2024 10:34:07 AM To: scverse/pertpy @.> Cc: Marwan @.>; Mention @.> Subject: Re: [scverse/pertpy] scCODA convergence issue for continous covairables (Issue #597)
Hi @Marwanshahttps://urldefense.com/v3/__https://github.com/Marwansha__;!!JFdNOqOXpB6UZW0!uRa8Y_TbOfofvt_Hors4oqUH05Sy6bp45y8atjZa9P6GWkRcZOkifCi6zX8lYbvknsIhQFwqIRq6_0iBkkj3dv4UE2LqKP8$! As discussed via email, I’ve looked into your issue. I was able to reproduce it using your data. However, the problem was resolved for me when I normalized the continuous covariate.
For example, I tested this with age. As you mentioned, using age groups as covariate works, but using age as a continuous covariate directly does not. However, applying simple min-max normalization resolved the issue. You can see this in the notebook herehttps://urldefense.com/v3/__https://github.com/Lilly-May/data-processing-pertpy/blob/main/scCODA_continuous_covariate.ipynb__;!!JFdNOqOXpB6UZW0!uRa8Y_TbOfofvt_Hors4oqUH05Sy6bp45y8atjZa9P6GWkRcZOkifCi6zX8lYbvknsIhQFwqIRq6_0iBkkj3dv4UsWvW63E$.
If you have time, please check if normalizing the features resolves the problem for you as well and report back here. I’m happy to look into this further if you’re still facing issues.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/scverse/pertpy/issues/597*issuecomment-2259962810__;Iw!!JFdNOqOXpB6UZW0!uRa8Y_TbOfofvt_Hors4oqUH05Sy6bp45y8atjZa9P6GWkRcZOkifCi6zX8lYbvknsIhQFwqIRq6_0iBkkj3dv4UsG6IMgc$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/BCN7634J4KRNEWCPBAZECD3ZPCOP7AVCNFSM6AAAAABH2QMBQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJZHE3DEOBRGA__;!!JFdNOqOXpB6UZW0!uRa8Y_TbOfofvt_Hors4oqUH05Sy6bp45y8atjZa9P6GWkRcZOkifCi6zX8lYbvknsIhQFwqIRq6_0iBkkj3dv4UjQhqbvU$. You are receiving this because you were mentioned.Message ID: @.***>
Hi @Marwansha !
I'm glad that there was finally a solution to your problem. Just to make it clear again - we are talking about normalizing covariates - i.e. the values of the columns in your anndata.obs that you specify in the formula, not the counts in anndata.X.
To answer your questions:
- Yes, you only need to normalize continuous covariates. Like in a linear regression model, all your covariates should be on a similar scale, which is close to [0, 1]. For categorical covariates, this is automatically the case since they are encoded in a binary fashion. For continuous ones, you need to do this yourself. As @Lilly-May stated, min-max scaling to the interval [0, 1] has shown good results in all our experiments so far.
- No, the counts do not need to be normalized, since scCODA is a compositional model and therefore models proportions between cell types. For them, the total count per sample is irrelevant.