pertpy Inconsistent covariate formatting for credible scCODA effects

Please make sure these conditions are met

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pertpy.
[ ] (optional) I have confirmed this bug exists on the main branch.

Report

Consider the following scCODA run (as per the pertpy documentation):

import pertpy as pt
haber_cells = pt.dt.haber_2017_regions()
sccoda = pt.tl.Sccoda()
mdata = sccoda.load(haber_cells,
                    type="cell_level",
                    generate_sample_level=True,
                    cell_type_identifier="cell_label",
                    sample_identifier="batch",
                    covariate_obs=["condition"])
mdata = sccoda.prepare(mdata,
                       formula="condition",
                       reference_cell_type="Endocrine")
sccoda.run_nuts(mdata,
                num_warmup=100,
                num_samples=1000,
                rng_key=42)

pertpy.tools.Sccoda provides two methods to query effects:

effects = sccoda.get_effect_df(mdata)
credible_effects = sccoda.credible_effects(mdata)

However, they use different Covariate styles to refer to the same covarites:

print(effects.index.get_level_values("Covariate"))

Index(['conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
       'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
       'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
       'conditionT.Hpoly.Day3', 'conditionT.Hpoly.Day3',
       'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
       'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
       'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
       'conditionT.Hpoly.Day10', 'conditionT.Hpoly.Day10',
       'conditionT.Salmonella', 'conditionT.Salmonella',
       'conditionT.Salmonella', 'conditionT.Salmonella',
       'conditionT.Salmonella', 'conditionT.Salmonella',
       'conditionT.Salmonella', 'conditionT.Salmonella'],
      dtype='object', name='Covariate')

print(credible_effects.index.get_level_values("Covariate"))

Index(['condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
       'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
       'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
       'condition[T.Hpoly.Day3]', 'condition[T.Hpoly.Day3]',
       'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
       'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
       'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
       'condition[T.Hpoly.Day10]', 'condition[T.Hpoly.Day10]',
       'condition[T.Salmonella]', 'condition[T.Salmonella]',
       'condition[T.Salmonella]', 'condition[T.Salmonella]',
       'condition[T.Salmonella]', 'condition[T.Salmonella]',
       'condition[T.Salmonella]', 'condition[T.Salmonella]'],
      dtype='object', name='Covariate')

Note how pertpy.tools.Sccoda.credible_effects wraps the T. and everything that follows in square brackets ([...]) while pertpy.tools.Sccoda.get_effect_df does not.

Is there any reason for this inconsistency? And if not, could you decide for one format and use it consistently?

Versions

session info

Package	Version
pandas	2.2.3
polars-lts-cpu	1.29.0
snakemake	0.0.0 (9.3.3)
anndata	0.11.4
PyYAML	6.0.2
matplotlib	3.10.1
pertpy	0.11.1
itables	2.3.0
mudata	0.3.1

Dependency	Version
jupyter_core	5.7.2
six	1.17.0
Pygments	2.19.1
ml_dtypes	0.5.1
decorator	5.2.1
opt_einsum	3.4.0
nvidia-cufile-cu12	1.11.1.6
scikit-learn	1.5.2
optax	0.2.4
scvi-tools	1.2.2.post2
docrep	0.3.2
wadler_lindig	0.1.5
executing	2.2.0
protobuf	6.30.2
nvidia-nvjitlink-cu12	12.6.85
sparsecca	0.3.1
cycler	0.12.1
nvidia-cublas-cu12	12.6.4.1
idna	3.10
msgpack	1.1.0
nvidia-cusolver-cu12	11.7.1.2
stack_data	0.6.3
Jinja2	3.1.6
scanpy	1.11.1
filelock	3.18.0
h5py	3.13.0
equinox	0.12.1
ipywidgets	8.1.7
fsspec	2025.3.2
nvidia-curand-cu12	10.3.7.77
python-dateutil	2.9.0.post0
lightning-utilities	0.14.3
ml_collections	1.1.0
ply	3.11
ott-jax	0.5.0
jedi	0.19.2
adjustText	1.3.0
torch	2.7.0 (2.7.0+cu126)
typing_extensions	4.13.2
packaging	24.2
chardet	5.2.0
scipy	1.15.2
toolz	1.0.0
threadpoolctl	3.6.0
jaxopt	0.8.5
lightning	2.5.1.post0
nvidia-nccl-cu12	2.26.2
snakemake-interface-logger-plugins	1.2.3
certifi	2025.4.26 (2025.04.26)
numba	0.60.0
pyparsing	3.2.3
scikit-misc	0.5.1
networkx	3.4.2
pure_eval	0.2.3
MarkupSafe	3.0.2
blitzgsea	1.3.53
flax	0.10.6
nvidia-cuda-cupti-cu12	12.6.80
nvidia-cufft-cu12	11.3.0.4
mpmath	1.3.0
decoupler	1.8.0
jaxtyping	0.3.2
urllib3	2.4.0
pyzmq	26.4.0
parso	0.8.4
attrs	25.3.0
statsmodels	0.14.4
lamin_utils	0.14.0
multipledispatch	1.0.0 (0.6.0)
numpy	1.26.4
joblib	1.5.0
rich	14.0.0
Brotli	1.1.0
nvidia-cuda-nvrtc-cu12	12.6.77
ipython	9.2.0
jax	0.6.0
pyarrow	20.0.0
sparse	0.16.0
triton	3.3.0
snakemake-interface-executor-plugins	9.3.5
xarray	2025.4.0
colorama	0.4.6
sympy	1.14.0
zstandard	0.23.0
comm	0.2.2
importlib_resources	6.5.2
simplejson	3.20.1
arviz	0.21.0
prompt_toolkit	3.0.51
nvidia-cuda-runtime-cu12	12.6.77
traitlets	5.14.3
chex	0.1.89
debugpy	1.8.14
torchmetrics	1.7.1
pycparser	2.22
pyro-ppl	1.9.1
nvidia-cusparse-cu12	12.5.4.2
requests	2.32.3
charset-normalizer	3.4.2
nvidia-cudnn-cu12	9.5.1.17
pytorch-lightning	2.5.1.post0
etils	1.12.2
patsy	1.0.1
snakemake-interface-storage-plugins	4.2.1
jaxlib	0.6.0
xarray-einstats	0.8.0
legacy-api-wrap	1.4.1
numpyro	0.18.0
tqdm	4.67.1
pytz	2025.2
asttokens	3.0.0
absl-py	2.2.2
platformdirs	4.3.8
llvmlite	0.43.0
pillow	11.2.1
tornado	6.4.2
snakemake-interface-common	1.17.4
PySocks	1.7.1
seaborn	0.13.2
lineax	0.0.8
fast-array-utils	1.2
cffi	1.17.1
jupyter_client	8.6.3
ipykernel	6.29.5
defusedxml	0.7.1
natsort	8.4.0
PubChemPy	1.0.4
wcwidth	0.2.13
pyomo	6.9.2
setuptools	80.1.0
nvidia-nvtx-cu12	12.6.77
Pillow-SIMD	9.5.0.post2 (11.2.1)
session-info2	0.1.2
psutil	7.0.0
kiwisolver	1.4.8

Component	Info
Python	3.12.10
OS	Linux-6.9.8-amd64-x86_64-with-glibc2.41
Updated	2025-05-12 07:31

May 12 '25 07:05 mschilli87

From a quick look, this line might be responsible for the deviation:

https://github.com/scverse/pertpy/blob/1e52b1f9ee948afab1a926c98ac9223f43006e45/pertpy/tools/_coda/_base_coda.py#L1010

It also suggests that the condition part in my example above was expected to be Condition. Is it possible that this is some leftover prettifying code based on an outdated API? After all, those DataFrames come from arviz.summary AFAICT:

https://github.com/scverse/pertpy/blob/1e52b1f9ee948afab1a926c98ac9223f43006e45/pertpy/tools/_coda/_base_coda.py#L466-L475

As I don't really understand what is the intended behaviour and why, I am hesitant to dig deeper and draft a PR. But with some guidance I am happy to contribute, as usual.

May 12 '25 16:05 mschilli87

Thank you for digging into this already!

I'll get back to you in a few days.

May 12 '25 18:05 Zethson

@johannesostner do you have any reasoning? Otherwise, I'll simplify and harmonize this.

May 15 '25 21:05 Zethson

These covariate names with format "condition[...]" are generated automatically by patsy when using categorical covariates. From what I can tell, this is simply a leftover inconsistency from when we ported scCODA to this repo. In scCODA, I removed all the unnecessary information from the covariate names, which is apparently not the case for all pertpy functions. Looks like patsy since changed the capital "Condition" to a lowercase "condition" in their covariate naming, too (I can't find this change in the release notes though) - that's why this part doesn't have any effect anymore. I'm fine with either leaving all the names as-is or making them all prettier

May 16 '25 06:05 johannesostner

@mschilli87 I am in favor of making them pretty and more importantly consistent. However, if we do so, we'd need to ensure that we don't break existing user code.

It would be another cool contribution of yours if you're up for it. It's not terribly urgent so there's certainly no pressure.

May 21 '25 07:05 Zethson

@Zethson: I think it is hard to predict all potential cases with complex models. Plus patsy is a moving target as well as pointed out above. I can give it a shot if you want to but I think keeping the cumbersome but informative and consistent version would reduce the maintainance burden. As a user, I don't mind 'prettifying' them myself for my specific use case. I just struggle with the inconsistency.

If you still want me to do this, I would try restore the original scCODA behaviour and apply it consistently.

May 21 '25 08:05 mschilli87