Inconsistent covariate formatting for credible scCODA effects
Please make sure these conditions are met
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pertpy.
- [ ] (optional) I have confirmed this bug exists on the main branch.
Report
Consider the following scCODA run (as per the pertpy documentation):
import pertpy as pt
haber_cells = pt.dt.haber_2017_regions()
sccoda = pt.tl.Sccoda()
mdata = sccoda.load(haber_cells,
type="cell_level",
generate_sample_level=True,
cell_type_identifier="cell_label",
sample_identifier="batch",
covariate_obs=["condition"])
mdata = sccoda.prepare(mdata,
formula="condition",
reference_cell_type="Endocrine")
sccoda.run_nuts(mdata,
num_warmup=100,
num_samples=1000,
rng_key=42)
pertpy.tools.Sccoda provides two methods to query effects:
effects = sccoda.get_effect_df(mdata)
credible_effects = sccoda.credible_effects(mdata)
However, they use different Covariate styles to refer to the same covarites:
print(effects.index.get_level_values("Covariate"))
Index(['conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
'conditionT.Salmonella', 'conditionT.Salmonella',
'conditionT.Salmonella', 'conditionT.Salmonella',
'conditionT.Salmonella', 'conditionT.Salmonella',
'conditionT.Salmonella', 'conditionT.Salmonella'],
dtype='object', name='Covariate')
print(credible_effects.index.get_level_values("Covariate"))
Index(['condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
'condition[T.Salmonella]', 'condition[T.Salmonella]',
'condition[T.Salmonella]', 'condition[T.Salmonella]',
'condition[T.Salmonella]', 'condition[T.Salmonella]',
'condition[T.Salmonella]', 'condition[T.Salmonella]'],
dtype='object', name='Covariate')
Note how pertpy.tools.Sccoda.credible_effects wraps the T. and everything that follows in square brackets ([...]) while pertpy.tools.Sccoda.get_effect_df does not.
Is there any reason for this inconsistency? And if not, could you decide for one format and use it consistently?
Versions
session info
| Package | Version |
|---|---|
| pandas | 2.2.3 |
| polars-lts-cpu | 1.29.0 |
| snakemake | 0.0.0 (9.3.3) |
| anndata | 0.11.4 |
| PyYAML | 6.0.2 |
| matplotlib | 3.10.1 |
| pertpy | 0.11.1 |
| itables | 2.3.0 |
| mudata | 0.3.1 |
| Dependency | Version |
|---|---|
| jupyter_core | 5.7.2 |
| six | 1.17.0 |
| Pygments | 2.19.1 |
| ml_dtypes | 0.5.1 |
| decorator | 5.2.1 |
| opt_einsum | 3.4.0 |
| nvidia-cufile-cu12 | 1.11.1.6 |
| scikit-learn | 1.5.2 |
| optax | 0.2.4 |
| scvi-tools | 1.2.2.post2 |
| docrep | 0.3.2 |
| wadler_lindig | 0.1.5 |
| executing | 2.2.0 |
| protobuf | 6.30.2 |
| nvidia-nvjitlink-cu12 | 12.6.85 |
| sparsecca | 0.3.1 |
| cycler | 0.12.1 |
| nvidia-cublas-cu12 | 12.6.4.1 |
| idna | 3.10 |
| msgpack | 1.1.0 |
| nvidia-cusolver-cu12 | 11.7.1.2 |
| stack_data | 0.6.3 |
| Jinja2 | 3.1.6 |
| scanpy | 1.11.1 |
| filelock | 3.18.0 |
| h5py | 3.13.0 |
| equinox | 0.12.1 |
| ipywidgets | 8.1.7 |
| fsspec | 2025.3.2 |
| nvidia-curand-cu12 | 10.3.7.77 |
| python-dateutil | 2.9.0.post0 |
| lightning-utilities | 0.14.3 |
| ml_collections | 1.1.0 |
| ply | 3.11 |
| ott-jax | 0.5.0 |
| jedi | 0.19.2 |
| adjustText | 1.3.0 |
| torch | 2.7.0 (2.7.0+cu126) |
| typing_extensions | 4.13.2 |
| packaging | 24.2 |
| chardet | 5.2.0 |
| scipy | 1.15.2 |
| toolz | 1.0.0 |
| threadpoolctl | 3.6.0 |
| jaxopt | 0.8.5 |
| lightning | 2.5.1.post0 |
| nvidia-nccl-cu12 | 2.26.2 |
| snakemake-interface-logger-plugins | 1.2.3 |
| certifi | 2025.4.26 (2025.04.26) |
| numba | 0.60.0 |
| pyparsing | 3.2.3 |
| scikit-misc | 0.5.1 |
| networkx | 3.4.2 |
| pure_eval | 0.2.3 |
| MarkupSafe | 3.0.2 |
| blitzgsea | 1.3.53 |
| flax | 0.10.6 |
| nvidia-cuda-cupti-cu12 | 12.6.80 |
| nvidia-cufft-cu12 | 11.3.0.4 |
| mpmath | 1.3.0 |
| decoupler | 1.8.0 |
| jaxtyping | 0.3.2 |
| urllib3 | 2.4.0 |
| pyzmq | 26.4.0 |
| parso | 0.8.4 |
| attrs | 25.3.0 |
| statsmodels | 0.14.4 |
| lamin_utils | 0.14.0 |
| multipledispatch | 1.0.0 (0.6.0) |
| numpy | 1.26.4 |
| joblib | 1.5.0 |
| rich | 14.0.0 |
| Brotli | 1.1.0 |
| nvidia-cuda-nvrtc-cu12 | 12.6.77 |
| ipython | 9.2.0 |
| jax | 0.6.0 |
| pyarrow | 20.0.0 |
| sparse | 0.16.0 |
| triton | 3.3.0 |
| snakemake-interface-executor-plugins | 9.3.5 |
| xarray | 2025.4.0 |
| colorama | 0.4.6 |
| sympy | 1.14.0 |
| zstandard | 0.23.0 |
| comm | 0.2.2 |
| importlib_resources | 6.5.2 |
| simplejson | 3.20.1 |
| arviz | 0.21.0 |
| prompt_toolkit | 3.0.51 |
| nvidia-cuda-runtime-cu12 | 12.6.77 |
| traitlets | 5.14.3 |
| chex | 0.1.89 |
| debugpy | 1.8.14 |
| torchmetrics | 1.7.1 |
| pycparser | 2.22 |
| pyro-ppl | 1.9.1 |
| nvidia-cusparse-cu12 | 12.5.4.2 |
| requests | 2.32.3 |
| charset-normalizer | 3.4.2 |
| nvidia-cudnn-cu12 | 9.5.1.17 |
| pytorch-lightning | 2.5.1.post0 |
| etils | 1.12.2 |
| patsy | 1.0.1 |
| snakemake-interface-storage-plugins | 4.2.1 |
| jaxlib | 0.6.0 |
| xarray-einstats | 0.8.0 |
| legacy-api-wrap | 1.4.1 |
| numpyro | 0.18.0 |
| tqdm | 4.67.1 |
| pytz | 2025.2 |
| asttokens | 3.0.0 |
| absl-py | 2.2.2 |
| platformdirs | 4.3.8 |
| llvmlite | 0.43.0 |
| pillow | 11.2.1 |
| tornado | 6.4.2 |
| snakemake-interface-common | 1.17.4 |
| PySocks | 1.7.1 |
| seaborn | 0.13.2 |
| lineax | 0.0.8 |
| fast-array-utils | 1.2 |
| cffi | 1.17.1 |
| jupyter_client | 8.6.3 |
| ipykernel | 6.29.5 |
| defusedxml | 0.7.1 |
| natsort | 8.4.0 |
| PubChemPy | 1.0.4 |
| wcwidth | 0.2.13 |
| pyomo | 6.9.2 |
| setuptools | 80.1.0 |
| nvidia-nvtx-cu12 | 12.6.77 |
| Pillow-SIMD | 9.5.0.post2 (11.2.1) |
| session-info2 | 0.1.2 |
| psutil | 7.0.0 |
| kiwisolver | 1.4.8 |
| Component | Info |
|---|---|
| Python | 3.12.10 |
| OS | Linux-6.9.8-amd64-x86_64-with-glibc2.41 |
| Updated | 2025-05-12 07:31 |
From a quick look, this line might be responsible for the deviation:
https://github.com/scverse/pertpy/blob/1e52b1f9ee948afab1a926c98ac9223f43006e45/pertpy/tools/_coda/_base_coda.py#L1010
It also suggests that the condition part in my example above was expected to be Condition.
Is it possible that this is some leftover prettifying code based on an outdated API?
After all, those DataFrames come from arviz.summary AFAICT:
https://github.com/scverse/pertpy/blob/1e52b1f9ee948afab1a926c98ac9223f43006e45/pertpy/tools/_coda/_base_coda.py#L466-L475
As I don't really understand what is the intended behaviour and why, I am hesitant to dig deeper and draft a PR. But with some guidance I am happy to contribute, as usual.
Thank you for digging into this already!
I'll get back to you in a few days.
@johannesostner do you have any reasoning? Otherwise, I'll simplify and harmonize this.
These covariate names with format "condition[...]" are generated automatically by patsy when using categorical covariates. From what I can tell, this is simply a leftover inconsistency from when we ported scCODA to this repo. In scCODA, I removed all the unnecessary information from the covariate names, which is apparently not the case for all pertpy functions. Looks like patsy since changed the capital "Condition" to a lowercase "condition" in their covariate naming, too (I can't find this change in the release notes though) - that's why this part doesn't have any effect anymore. I'm fine with either leaving all the names as-is or making them all prettier
@mschilli87 I am in favor of making them pretty and more importantly consistent. However, if we do so, we'd need to ensure that we don't break existing user code.
It would be another cool contribution of yours if you're up for it. It's not terribly urgent so there's certainly no pressure.
@Zethson: I think it is hard to predict all potential cases with complex models. Plus patsy is a moving target as well as pointed out above. I can give it a shot if you want to but I think keeping the cumbersome but informative and consistent version would reduce the maintainance burden. As a user, I don't mind 'prettifying' them myself for my specific use case. I just struggle with the inconsistency.
If you still want me to do this, I would try restore the original scCODA behaviour and apply it consistently.