hdbscan
hdbscan copied to clipboard
ValueError: numpy.ndarray size changed when calling import hdbscan
When I try to import hdbscan I get following error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\program files\python37\lib\site-packages\hdbscan_init_.py in
c:\program files\python37\lib\site-packages\hdbscan\hdbscan_.py in
hdbscan_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
I use: python 3.7.9 numpy 1.19.3 (I also tried 1.19.5)
I would appreciate your help.
Having this same exact issue as of Yesterday on Python3.8 any Numpy Version in the past Year
Also having this issue. Tried Numpy version 1.20 and 1.16.1
The same with Python 3.7.9 in my case . Now it's working with Python 3.7.6 for me.
I fixed it by installing the package with with pip install adding the flags --no-cache-dir --no-binary :all:
Apparently this allows your wheels to re-compile with your local version of Numpy.
I honestly have no idea why this is happening, in addition to other packages I use - perhaps someone re-compiled Cython scripts with and didn't make a changelog. I'm literally shooting completely blind here though.
Reompile also worked for me. Using public cloud that messes with compilation.
Reompile also worked for me. Using public cloud that messes with compilation.
But does anyone know WHY this is actually happening? Especially on different projects as well outside of this repo?
@omarsumadi can you explain me how to do that? I put the --no-cache-dir --no-binary :all: at the end of all my pip install lines but it didn't worked in Python 3.7.9.
@paulthemagno Take a look at this stack overflow post: https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility
Realistically, the only thing you would change would be: pip install hdbscan --no-cache-dir --no-binary :all:
If that doesn't work, I'm not sure. Try not setting a version of Numpy to install and letting Pip reconcile which Numpy should be installed if you are using multiple packages that rely on Numpy. Perhaps your issue is a bit deeper.
The way to actually solve all this though is to figure out why this happened in the first place.
I use another package https://github.com/ing-bank/sparse_dot_topn with cython and numpy. And from today/yesterday, I got exactly the same error numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject.
My enviroment is aws/codebuild/amazonlinux2-x86_64-standard:3.0. I downgraded numpy version and it doesn't work.
pip install package --no-cache-dir --no-binary :all: fixed the problem. FYI.
@ymwdalex That's actually the same package I came to this thread for. I don't have hbdscan installed, but came to help because I was trying to solve the sparse_dot_topn package issue.
To you, do you know why this is happening? I really don't want to have another go at fixing this bug again and having no idea where to start.
We could start by asking them. Or maybe scipy (a dependcy of both) decided to re-compile it's wheels to a different version of Numpy and everything broke?
@omarsumadi thanks for the comments. I am the author of sparse_dot_topn. I didn't change the source code recently and have no idea why this happening...
@ymwdalex Ok - that is kind of funny lol! By the way, hi! I love you work and everything that you have done the library is truly one of a kind and I have not found anything that comes close to its capabilities, which is sort of why I have a vested interest in seeing this through.
I'll spill to you wat I could figure out:
- The only thing in common that both of these packages have is Numpy and Scipy
- Scipy has a history of this happening in the past with other errors that are similar to this type. See - https://github.com/scikit-learn-contrib/hdbscan/issues/272.
- Numpy Versioning seems to have an impact on these errors and Scipy is consistently causing issues.
- Someone at Scipy must have tried to re-compile with a later version of Numpy that perhaps broke something.
Again, this kind of thing is way outside of my comfort zone (I know nothing about Cython and Numpy cross-over), but perhaps we could find the version of Numpy that was used to compile the wheels and pin that as the version for your library?
Sorry if some of this doesn't make much sense.
The same with
Python 3.7.9in my case . Now it's working withPython 3.7.6for me.
I eventually installed python 3.7.6 and everything worked. However, I have another machine with 3.7.9 where everything works fine. So its not related to python version I think..
@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd imagine this is definitely NOT solved especially since its pulling cross-package discussion.
I think there's a lot of cross interest figuring out what exactly happened as well. Unfortunately, I'm not well versed enough in Cython and Numpy internals to offer the correct solution other than to rebuild the wheels.
Thanks, Omar
@doctor3030 I'm not sure if you should close this, not until there's some better solution to other people's problems. I don't want to tell you how to do things and I most definitely respect your contributions, but I'd image this is definitely NOT solved especially since its pulling cross-package discussion.
I think there's a lot of cross interest figuring out what exactly happened as well.
Thanks, Omar
Ok, lets keep it open.
Here's what I can say, apparently someone says Numpy 1.20.0 (probably what Scipy is compiled in due to some change that is now impacting all of us) according to the above (https://github.com/Trusted-AI/adversarial-robustness-toolbox/pull/87).
What is most likely happening among us that is that we are using packages that limit Numpy installation version to something below 1.20.0 (such as Tensorflow).
Perhaps someone could verify the pull I linked?
I have this issue when trying to use Top2Vec on Python 3.7.9, which pulls in Tensorflow and locks me to Numpy 1.19. Rebuilding HDBScan from source in turn fails on this Accelerate error, so I think I have to rebuild NumPy from source with OpenBLAS (although NumPy is otherwise working fine), which in turn is proving difficult.
So this is still very much an issue for me, no doubt for some others too.
@cavvia the same with a similar library: BERTopic to me! I tried also with pip install package --no-cache-dir --no-binary :all: but doesn't change anything. But in my case the problem occurs in a Python 3.7.9 while with Python 3.7.6 it works well.
I can report the same issue as @cavvia after trying to use top2vec on 3.8.0 and on 3.7.5... encountering issues with UMAP when trying to work around it...
Hello guys, we're facing the same issue here since this last weekend with no changes on the code or any library versions.
Isolating it to check what could have been happening
Dockerfile
FROM python:3.7-slim-buster
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3.7-dev=3.7.3-2+deb10u2 build-essential=12.6 jq=1.5+dfsg-2+b1 curl=7.64.0-4+deb10u1 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install --upgrade pip
COPY . .
RUN python -m pip install --user -r requirements.txt
CMD ["python", "-m", "test.py"]
requirements.txt
hdbscan==0.8.26
numpy==1.18.5
test.py
import hdbscan
print("hello")
outputs
$ docker run 9523faa77267 python test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
import hdbscan
File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
from .hdbscan_ import HDBSCAN, hdbscan
File "/home/someuser/.local/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module>
from ._hdbscan_linkage import (single_linkage,
File "hdbscan/_hdbscan_linkage.pyx", line 1, in init hdbscan._hdbscan_linkage
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
It works with numpy==1.20 tough.
The point is, as mentioned here before, we use tensorflow on our project and we're locked by it on numpy<1.19.
I'm new on the python/pypi world, but I assumed that built wheels couldn't be updated (recompiled with updated libraries/dependencies) and if a updated was needed, a new release would be drafted with a minor change.
Is there anything else we can help with? I couldn't get exactly which lib was recompiled (hdbscan or scipy?) but noticed a difference on the checksum/size for the hdbscan on different builds but not sure it's related.
# last week (when everything worked)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=687506 sha256=bd8b0c65d14ffa1d804f4a3df445fc4300452968a2372d581f0bb64963a8010d
# yesterday (when the error started happening)
Created wheel for hdbscan: filename=hdbscan-0.8.26-cp37-cp37m-linux_x86_64.whl size=686485 sha256=05668339290a597a871ee90da2b50a7ca415f18b82dba59ad6c08bb9b5b9192f
@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.
numpy 1.20.0 works for me.
In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.
Make sure that you use correct and compatible version of libs .
annoy==1.17.0 cython==0.29.21 fuzzywuzzy==0.18.0 hdbscan==0.8.26 joblib==1.0.0 kiwisolver==1.3.1 llvmlite==0.35.0 matplotlib==3.3.2 numba==0.52.0 numpy==1.20.0 pandas==1.1.2 pillow==8.1.0 pyarrow==1.0.1 python-levenshtein==0.12.1 pytz==2021.1 scikit-learn==0.24.1 scipy==1.6.0 six==1.15.0 threadpoolctl==2.1.0 tqdm==4.50.0 umap-learn==0.5.0
@omarsumadi Thanks a lot for your investigation. I also open an issue in sparse_dot_topn package to refer this issue.
numpy 1.20.0 works for me.
In my environment which has problem, I installed numpy==1.19 first, then install sport_dot_topn, which use the latest cython and scipy (https://github.com/ing-bank/sparse_dot_topn/blob/master/setup.py#L70). Probably the latest cython or scipy has some update with incompatible with numpy version before 1.20.
@ymwdalex Alternative is to (downgrade Scipy as well and keep the current Numpy version) or (install with no binary :all:). The problem is I stand to bet a lot of people are going to probably use some other Pip Package that doesn't support Numpy 1.20.0 (big hint to Tensorflow) (especially since the new version number represents a step up so many people may have < 1.20.0 in their setups.
I admit that I am as much at a loss as everyone else here. In fact I have little understanding of the binary wheel infrastructure on PyPI. I have not provided any new packages or wheels for hdbscan recently (i.e. within the last many months), so if there is a change it was handled by some automated process. Compiling from source (and, in fact, re-cythonizing everything) is likely the best option, but that does not leave a great install option. Any assistance from anyone with more experience in packaging than me would be greatly appreciated.
This was resolved for me using the following requirements: cython==0.29.21 numpy==1.20.0 scipy==1.5.4 scikit-learn==0.24.1 joblib==1.0.0 six==1.15.0
@lmcinnes - it might be due to some packages in the requirements.txt not being pinned such as numpy>=1.16.0. it could be worth looking into pinning them in both directions >= x, <= y such as here
@salman1993 Thanks. I agree that something like that might be good, however the difficulty is that it does work with numpy 1.20; it is in interactions with other packages that then install numpy 1.19, or similar. That means I'm not really sure what bounds to use. For now I may just restrict to numpy <= 1.19 as hopefully that may fix things for the moment, but I feel like that is really just a temporary fix, will be unnecessarily restrictive on numpy versions in the not too distance future.
So what fixed it for me is installing with pip using --no-cache-dir --no-binary :all:. Is there any merit to doing that? Or is installing with pip forcing --no-binary not something looked upon highly?
Restricting the version doesn't help (at least I don't think) because it is the old version (non-1.20.0) that is causing the issues. It's most likely the fact that Scipy is compiled in 1.20.0 and everyone else isn't using 1.20.0 and the backwards compatibility in wheels everyone's been accustomed to broke.
Someone from Scipy (not Scikit's problem) (what everyone here has in common) needs to come and say what happened :) so we can all figure out how to proceed, but that's my guess as to what happened.
I just spent the last while trying to reproduce this, and to work out what is going astray. I don't have any firm answers, but it seems like starting from a fresh python environment as long as you pick one of numpy 1.19 or numpy 1.20 and then stick with that version for any other packages that get installed (i.e. if you have any dependencies that need numpy 1.19, start with that version, and stay with it), everything works fine. It was when I had an install of a package changed the numpy version I had installed I could get this error.
Other ways I imagine you may be able to get the error: if your pip cache has a version downloaded (and possibly built into a wheel) when you had a different numpy version then things could go astray like this. The fix for that seems to be the --no-cache-dir --no-binary :all: option to pip.
I'm not sure I have any good answers other than managing to ensure you are building hdbscan against the version of numpy you intend to keep (and not have another package with different dependencies trample on), and to use the --no-cache-dir --no-binary :all: to ensure you are building fresh and not using an old cached wheel or similar. I know that isn't perfect, but it is what I can say for now. Hopefully over the next week or two this will shake itself out among all the various packages and dependencies.
the day this issue started, both pip==21.01 and numpy==1.20.0 were released.
there's an issue over at pypa/pip#9542 that suggests that pip might be resolving things weirdly, now that a new version of numpy exists. some weirdness like pip may be detecting a different numpy for dependency resolution than the version you have pinned, causing it to select binaries for other packages that were compiled against the new numpy instead of your pinned version.
fwiw I have a lock file with a pinned numpy==1.19.2 and hdbscan==0.8.26 built in a clean docker image. it worked fine, and now no longer builds since this issue started. even with no caches and locked versions of pip, numpy, etc – which all did build last week. pinning to pip==21.0 also doesn't fix the issue.
it seems like the combination of pip – at least >=21.0, maybe others – and the existence of numpy==1.20.0 is the cause. which may be why --no-binary is a possible fix (as iiuc that causes everything to be recompiled from scratch, instead of using mismatched binaries). possibly pinning to numpy==1.20.0 might also work for now where that's an option