Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x
Please make sure these conditions are met
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
In scanpy-1.9.8 DotPlots the default ordering of categories is alphabetical, adjusting to what was requested via groupby. This also worked when multiple columns were requested, eliminating the need to manually compose the alphabetical ordering of all existing combinations of observations in the plot.
The default ordering in scanpy>=1.10.0 DotPlots has changed, and plot display wrong data:
- Ordering is no longer alphabetical. It seems that the categories are being ordered as if a dendrogram had been requested.
- Additionally, when adding totals with
add_totals(), the bar plots with cell counts do follow the default alphabetical ordering, making the plot display wrong data (!).
The example below shows the misbehaviour using the example in https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html
Using the code example below; here is the expected plot with scanpy-1.9.8 (same result as in the URL above):
and here is the erroneous result with scanpy-1.10.1 and 1.10.0 (wrong ordering, mismatching totals):
Minimal code sample
import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()
markers = {'T-cell': 'CD3D', 'B-cell': 'CD79A', 'myeloid': 'CST3'}
dp = sc.pl.dotplot(pbmc, markers, 'bulk_labels', return_fig=True)
dp.add_totals().style(dot_edge_color='black', dot_edge_lw=0.5).show()
Error output
(Error output is a bad plot, included in the description above.)
Versions
-----
anndata 0.10.7
scanpy 1.10.1
-----
IPython 8.13.2
PIL 10.0.0
asciitree NA
asttokens NA
astunparse 1.6.3
backcall 0.2.0
cffi 1.15.1
cloudpickle 2.2.1
colorama 0.4.4
cycler 0.10.0
cython_runtime NA
cytoolz 0.12.0
dask 2023.10.1
dateutil 2.8.2
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.6
dot_parser NA
entrypoints 0.4
exceptiongroup 1.1.1
executing 1.2.0
fasteners 0.17.3
flytekitplugins NA
gmpy2 2.1.2
google NA
h5py 3.8.0
icu 2.11
igraph 0.11.2
jedi 0.19.1
jinja2 3.1.2
joblib 1.2.0
kiwisolver 1.4.4
legacy_api_wrap NA
leidenalg 0.10.2
llvmlite 0.42.0
lz4 4.3.2
markupsafe 2.1.2
matplotlib 3.8.3
mpl_toolkits NA
mpmath 1.3.0
msgpack 1.0.5
natsort 8.3.1
numba 0.59.1
numcodecs 0.11.0
numexpr 2.7.3
numpy 1.26.4
packaging 23.1
pandas 1.5.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
plotly 5.14.1
prompt_toolkit 3.0.38
psutil 5.9.5
ptyprocess 0.7.0
pure_eval 0.2.2
pyarrow 10.0.1
pydot 1.4.2
pygments 2.15.1
pyparsing 3.0.9
pyteomics NA
pytz 2023.3.post1
scipy 1.13.0
session_info 1.0.0
setuptools 67.7.2
setuptools_scm NA
six 1.16.0
sklearn 1.2.2
stack_data 0.6.2
sympy 1.11.1
tblib 1.7.0
texttable 1.6.7
threadpoolctl 3.1.0
tlz 0.12.0
toolz 0.11.2
torch 2.1.1
torchgen NA
tqdm 4.65.0
traitlets 5.9.0
typing_extensions NA
wcwidth 0.2.6
xxhash NA
yaml 5.4.1
zarr 2.14.2
zc NA
zipp NA
zoneinfo NA
-----
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
macOS-14.4.1-x86_64-i386-64bit
-----
Session information updated at 2024-05-15 18:46
Hi, thanks for the report!
Note that the plots in the documentation are generated on the fly when building the documentation. The plot you currently see on https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html has therefore been created with scanpy 1.10.1
Must be a dependency issue, I’ll try to reproduce with the environment you provided.
/edit: I can reproduce it with that environment:
environment.yml
name: scanpy-3062
channels:
- conda-forge
dependencies:
- ipykernel
- python==3.10.10
- anndata==0.10.7
- scanpy==1.10.1
- IPython==8.13.2
- pillow==10.0.0
- astunparse==1.6.3
- backcall==0.2.0
- cffi==1.15.1
- cloudpickle==2.2.1
- colorama==0.4.4
- cycler==0.10.0
- cytoolz==0.12.0
- dask==2023.10.1
#- dateutil==2.8.2
- decorator==5.1.1
- defusedxml==0.7.1
- dill==0.3.6
- entrypoints==0.4
- exceptiongroup==1.1.1
- executing==1.2.0
- fasteners==0.17.3
- gmpy2==2.1.2
- h5py==3.8.0
#- icu==2.11
- python-igraph==0.11.2
- jedi==0.19.1
- jinja2==3.1.2
- joblib==1.2.0
- kiwisolver==1.4.4
- leidenalg==0.10.2
- llvmlite==0.42.0
- lz4==4.3.2
- markupsafe==2.1.2
- matplotlib==3.8.3
- mpmath==1.3.0
#- msgpack==1.0.5
- natsort==8.3.1
- numba==0.59.1
- numcodecs==0.11.0
- numexpr==2.7.3
- numpy==1.26.4
- packaging==23.1
- pandas==1.5.3
- parso==0.8.3
- pexpect==4.8.0
- pickleshare==0.7.5
- plotly==5.14.1
- prompt_toolkit==3.0.38
- psutil==5.9.5
- ptyprocess==0.7.0
- pure_eval==0.2.2
- pyarrow==10.0.1
- pydot==1.4.2
- pygments==2.15.1
- pyparsing==3.0.9
- pytz==2023.3.post1
- scipy==1.13.0
#- session_info==1.0.0
#- setuptools==67.7.2
- six==1.16.0
- scikit-learn==1.2.2
- stack_data==0.6.2
- sympy==1.11.1
- tblib==1.7.0
- texttable==1.6.7
- threadpoolctl==3.1.0
#- tlz==0.12.0
- toolz==0.11.2
#- pytorch==2.1.1
- tqdm==4.65.0
- traitlets==5.9.0
- wcwidth==0.2.6
#- yaml==5.4.1
- zarr==2.14.2
OK, pretty sure this is because your environment uses pandas 1.5
You can circumvent it for now by setting dp.categories_order = dp.dot_color_df.index:
Thanks for the quick response, @flying-sheep!
I can confirm that updating pandas-2.2.2 does fix this. I totally missed this possibility; it's not clear to me why the dots would change ordering, but the totals wouldn't (maybe scanpy relies on default pandas behaviour that changed between 1.x and 2.x?). That said, pandas-2.x unfortunately breaks some dependencies in our environment, so I'll either pin scanpy or use your workaround.
Regarding the ordering and issue title change. Maybe a nit, but it's my understanding that the default ordering is alphabetical (which makese perfect sense as a default!). If this is correct, then I'd suggest that the wrong ordering is not the totals, but the categories themselves.
Given this, the workaround that gives me the expected behaviour would be dp.categories_order = dp.dot_color_df.index.sort_values():