scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x

Open rgoya opened this issue 1 year ago • 3 comments

Please make sure these conditions are met

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of scanpy.
  • [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

In scanpy-1.9.8 DotPlots the default ordering of categories is alphabetical, adjusting to what was requested via groupby. This also worked when multiple columns were requested, eliminating the need to manually compose the alphabetical ordering of all existing combinations of observations in the plot.

The default ordering in scanpy>=1.10.0 DotPlots has changed, and plot display wrong data:

  • Ordering is no longer alphabetical. It seems that the categories are being ordered as if a dendrogram had been requested.
  • Additionally, when adding totals with add_totals(), the bar plots with cell counts do follow the default alphabetical ordering, making the plot display wrong data (!).

The example below shows the misbehaviour using the example in https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html

Using the code example below; here is the expected plot with scanpy-1.9.8 (same result as in the URL above): image

and here is the erroneous result with scanpy-1.10.1 and 1.10.0 (wrong ordering, mismatching totals): image

Minimal code sample

import scanpy as sc

pbmc = sc.datasets.pbmc68k_reduced()

markers = {'T-cell': 'CD3D', 'B-cell': 'CD79A', 'myeloid': 'CST3'}

dp = sc.pl.dotplot(pbmc, markers, 'bulk_labels', return_fig=True)
dp.add_totals().style(dot_edge_color='black', dot_edge_lw=0.5).show()

Error output

(Error output is a bad plot, included in the description above.)

Versions

-----
anndata     0.10.7
scanpy      1.10.1
-----
IPython             8.13.2
PIL                 10.0.0
asciitree           NA
asttokens           NA
astunparse          1.6.3
backcall            0.2.0
cffi                1.15.1
cloudpickle         2.2.1
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
cytoolz             0.12.0
dask                2023.10.1
dateutil            2.8.2
decorator           5.1.1
defusedxml          0.7.1
dill                0.3.6
dot_parser          NA
entrypoints         0.4
exceptiongroup      1.1.1
executing           1.2.0
fasteners           0.17.3
flytekitplugins     NA
gmpy2               2.1.2
google              NA
h5py                3.8.0
icu                 2.11
igraph              0.11.2
jedi                0.19.1
jinja2              3.1.2
joblib              1.2.0
kiwisolver          1.4.4
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.42.0
lz4                 4.3.2
markupsafe          2.1.2
matplotlib          3.8.3
mpl_toolkits        NA
mpmath              1.3.0
msgpack             1.0.5
natsort             8.3.1
numba               0.59.1
numcodecs           0.11.0
numexpr             2.7.3
numpy               1.26.4
packaging           23.1
pandas              1.5.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
plotly              5.14.1
prompt_toolkit      3.0.38
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pyarrow             10.0.1
pydot               1.4.2
pygments            2.15.1
pyparsing           3.0.9
pyteomics           NA
pytz                2023.3.post1
scipy               1.13.0
session_info        1.0.0
setuptools          67.7.2
setuptools_scm      NA
six                 1.16.0
sklearn             1.2.2
stack_data          0.6.2
sympy               1.11.1
tblib               1.7.0
texttable           1.6.7
threadpoolctl       3.1.0
tlz                 0.12.0
toolz               0.11.2
torch               2.1.1
torchgen            NA
tqdm                4.65.0
traitlets           5.9.0
typing_extensions   NA
wcwidth             0.2.6
xxhash              NA
yaml                5.4.1
zarr                2.14.2
zc                  NA
zipp                NA
zoneinfo            NA
-----
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
macOS-14.4.1-x86_64-i386-64bit
-----
Session information updated at 2024-05-15 18:46

rgoya avatar May 16 '24 01:05 rgoya

Hi, thanks for the report!

Note that the plots in the documentation are generated on the fly when building the documentation. The plot you currently see on https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html has therefore been created with scanpy 1.10.1

Must be a dependency issue, I’ll try to reproduce with the environment you provided.

/edit: I can reproduce it with that environment:

environment.yml
name: scanpy-3062
channels:
  - conda-forge
dependencies:
- ipykernel

- python==3.10.10
- anndata==0.10.7
- scanpy==1.10.1
- IPython==8.13.2
- pillow==10.0.0
- astunparse==1.6.3
- backcall==0.2.0
- cffi==1.15.1
- cloudpickle==2.2.1
- colorama==0.4.4
- cycler==0.10.0
- cytoolz==0.12.0
- dask==2023.10.1
#- dateutil==2.8.2
- decorator==5.1.1
- defusedxml==0.7.1
- dill==0.3.6
- entrypoints==0.4
- exceptiongroup==1.1.1
- executing==1.2.0
- fasteners==0.17.3
- gmpy2==2.1.2
- h5py==3.8.0
#- icu==2.11
- python-igraph==0.11.2
- jedi==0.19.1
- jinja2==3.1.2
- joblib==1.2.0
- kiwisolver==1.4.4
- leidenalg==0.10.2
- llvmlite==0.42.0
- lz4==4.3.2
- markupsafe==2.1.2
- matplotlib==3.8.3
- mpmath==1.3.0
#- msgpack==1.0.5
- natsort==8.3.1
- numba==0.59.1
- numcodecs==0.11.0
- numexpr==2.7.3
- numpy==1.26.4
- packaging==23.1
- pandas==1.5.3
- parso==0.8.3
- pexpect==4.8.0
- pickleshare==0.7.5
- plotly==5.14.1
- prompt_toolkit==3.0.38
- psutil==5.9.5
- ptyprocess==0.7.0
- pure_eval==0.2.2
- pyarrow==10.0.1
- pydot==1.4.2
- pygments==2.15.1
- pyparsing==3.0.9
- pytz==2023.3.post1
- scipy==1.13.0
#- session_info==1.0.0
#- setuptools==67.7.2
- six==1.16.0
- scikit-learn==1.2.2
- stack_data==0.6.2
- sympy==1.11.1
- tblib==1.7.0
- texttable==1.6.7
- threadpoolctl==3.1.0
#- tlz==0.12.0
- toolz==0.11.2
#- pytorch==2.1.1
- tqdm==4.65.0
- traitlets==5.9.0
- wcwidth==0.2.6
#- yaml==5.4.1
- zarr==2.14.2

flying-sheep avatar May 16 '24 11:05 flying-sheep

OK, pretty sure this is because your environment uses pandas 1.5

You can circumvent it for now by setting dp.categories_order = dp.dot_color_df.index:

flying-sheep avatar May 16 '24 12:05 flying-sheep

Thanks for the quick response, @flying-sheep!

I can confirm that updating pandas-2.2.2 does fix this. I totally missed this possibility; it's not clear to me why the dots would change ordering, but the totals wouldn't (maybe scanpy relies on default pandas behaviour that changed between 1.x and 2.x?). That said, pandas-2.x unfortunately breaks some dependencies in our environment, so I'll either pin scanpy or use your workaround.

Regarding the ordering and issue title change. Maybe a nit, but it's my understanding that the default ordering is alphabetical (which makese perfect sense as a default!). If this is correct, then I'd suggest that the wrong ordering is not the totals, but the categories themselves.

Given this, the workaround that gives me the expected behaviour would be dp.categories_order = dp.dot_color_df.index.sort_values(): image

rgoya avatar May 16 '24 17:05 rgoya