chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: chromadb 0.5.4 crashes on windows

Open petacube opened this issue 1 year ago • 52 comments

What happened?

running collection.add function crashes after 100 documents are inserted

Versions

chromadb 0.5.4, python 3.9;

Relevant log output

No response

petacube avatar Jul 13 '24 06:07 petacube

rolling the code back to 0.5.0 release of chromadb resolves the issue. please explain what is going on with crash

petacube avatar Jul 13 '24 06:07 petacube

Do you have a stack trace or any output?

HammadB avatar Jul 13 '24 06:07 HammadB

it crashes silently. the whole python process dies. there is not even exception thrown. i can try testing with on linux tmr to see if i can replicate the crash and run systrace to see if core dump can be captured.

petacube avatar Jul 13 '24 06:07 petacube

similar/same issue reported in discord - https://discord.com/channels/1073293645303795742/1261229903383236720

Windows Fatal Exception: Access Violation

image

tazarov avatar Jul 13 '24 10:07 tazarov

@petacube, unable to reproduce on GH windows-latest

Here's the test code - https://github.com/amikos-tech/chrm-2513-exp/blob/main/test_import.py

With the following WF - https://github.com/amikos-tech/chrm-2513-exp/actions/runs/9919966674/workflow

Conda env with Python 3.9 and Chroma 0.5.4

I tried adding things in bulk and separately. I also intentionally have high dimensional vectors (4096).

Let me know if you encounter the error in a similar setting.

tazarov avatar Jul 13 '24 13:07 tazarov

Hmm, I wonder if this is due to a chroma-hnswlib version mismatch. Can you run pip show chroma-hnswlib? It should be 0.7.5 for chroma 0.5.4

HammadB avatar Jul 13 '24 16:07 HammadB

my version of chroma-hnswlib is 0.7.3 should not the dependency like this be handled at chromadb level ?

petacube avatar Jul 13 '24 22:07 petacube

https://github.com/chroma-core/chroma/blob/2ae46d2dcdea1e57914dc8a3c68181840452eecb/pyproject.toml#L20

It is set here, i am not sure how you updated but maybe something went wrong. Can you upgrade the dep and try again

HammadB avatar Jul 14 '24 18:07 HammadB

i did pip install --upgrade chromadb==0.5.4, so probably that does not upgrade dependencies possibly?

petacube avatar Jul 14 '24 18:07 petacube

I had the same issue: Silent crash after updating to chromadb 0.5.4 on Windows EVEN WITH chroma-hnswlib vers. 0.7.5

I moved back to chromadb 0.5.0 and chroma-hnswlib 0.7.3 and everything is working like before.

kaixxx avatar Jul 16 '24 12:07 kaixxx

@kaixxx, can you confirm whether you were using anaconda? A user in Discord reported that the problems were resolved when he switched from anaconda to pip.

On a related note: If your environment rebuilds the chroma-hnsw lib that can be the culprit. Can you let me know what Python version and CPU Arch you have? We have prebuilt wheels for amd64 only on Windows (py39-py312).

tazarov avatar Jul 16 '24 12:07 tazarov

Thanks for looking into this. Here is some additional info:

  • I am using anaconda to manage my environments. However, I do not install any packages from Anaconda but use pip for everything.
  • hnswlib: After reading the above message about the possibIe problem with version 0.7.3 I've checked that I had 0.7.5 installed. Chromadb still crashed.
  • I am using Python 3.10.13
  • CPU: AMD Ryzen 7 6800U

kaixxx avatar Jul 16 '24 13:07 kaixxx

@kaixxx, in your venv can you run the following code with python:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
index.init_index(max_elements=1000, ef_construction=100, M=16)
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))

Let me know if this crashes

tazarov avatar Jul 16 '24 13:07 tazarov

Yes, it seems to crash. I've created a new environment, installed chromadb (0.5.4 with chroma-hnswlib 0.7.5). Then I've added the line print('finished') to the end of your script. This line is never reached. The script exits silently without any error message. In my other environment with chromadb 0.5.0, the script runs fine and prints 'finished'.

kaixxx avatar Jul 16 '24 13:07 kaixxx

Another test: I've now downgraded to chroma-hnswlib 0.7.3 but kept chromadb 0.5.4 and your script runs fine!

kaixxx avatar Jul 16 '24 13:07 kaixxx

@kaixxx thanks for confirming. Can you add debug prints like this to identify whether it fails in the init of the index or when adding vectors:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
print("New index - ok")
index.init_index(max_elements=1000, ef_construction=100, M=16)
print("Init index - ok")
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))
print("All good")

tazarov avatar Jul 16 '24 14:07 tazarov

Yes, output:

New index - ok
Init index - ok

(no "All good")

kaixxx avatar Jul 16 '24 14:07 kaixxx

@kaixxx, fantastic. Thank you for following up. 0.7.5 adds this change to add_items functionality - https://github.com/chroma-core/hnswlib/commit/408c5d1fa1dbc2acd8d1b4108191a8f803862210?diff=split&w=0#diff-ab27cbb27975c68cb0c6da824871058623f7f76a761c3c8365ef2e1395cf7cd9R1706-R1708

Can I ask you to rebuild the HNSW lib locally (if you have the necessary deps):

pip install --no-binary :all: chroma-hnswlib==0.7.5

tazarov avatar Jul 16 '24 14:07 tazarov

Hey @tazarov, I've tried to build it but it results in an error from the linker that a certain file could not be opened. It may be that my build environment is not set up properly, but I don't have the time to dig into that. Is there anything else I can do?

kaixxx avatar Jul 17 '24 07:07 kaixxx

when the document's length big enough and insert the 100th , then the bug will occur, Whether you insert data one by one or all at once

dddxst avatar Jul 18 '24 08:07 dddxst

Reproduced for python 3.12 and 3.10 on our windows machine (though this does not show up in CI, we should figure out why - perhaps the number of embeddings we insert in CI is not large enough to trigger this).

@HammadB and I are looking into it.

atroyn avatar Jul 19 '24 01:07 atroyn

I have confirmed that running with --no-binary (building from source) fixes this as a workaround. This points to an issue in the wheel build. Investigating further.

HammadB avatar Jul 19 '24 06:07 HammadB

It seems the windows wheels were building with AVX/SSE enabled if the runners they were compiled on had it, I guess previously for 0.7.3 the runner just happened to not have AVX/SSE but now it does. I have pushed an alpha release 0.7.6.alpha1.

@dddxst and @kaixxx and @petacube can you pip install chroma-hnswlib==0.7.6a1 and let me know if that fixes your issue? If so, I can issue a main release. Thanks.

HammadB avatar Jul 19 '24 07:07 HammadB

Thanks! I've tested chroma-hnswlib 0.7.6a1 with the above script and it still crashes, unfortunately. Exactly the same behavior as described in https://github.com/chroma-core/chroma/issues/2513#issuecomment-2231002576

kaixxx avatar Jul 19 '24 09:07 kaixxx

Have reproduced the 0.7.6a1 failure on our windows machine. The next step is to put a debugger on the cpp code itself. This will be a bit hairy but will coordinate with @HammadB to ship a fix.

atroyn avatar Jul 19 '24 23:07 atroyn

I had the same problem with 0.5.5 and downgrading to 0.5.3/0.7.3 has solved it for now!

EricBLivingston avatar Jul 23 '24 16:07 EricBLivingston

@EricBLivingston what version of python are you on?

HammadB avatar Jul 23 '24 16:07 HammadB

@EricBLivingston what version of python are you on?

Version 3.11.9

EricBLivingston avatar Jul 24 '24 16:07 EricBLivingston

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

tazarov avatar Jul 25 '24 13:07 tazarov

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

@tazarov can you please post a summary of this long conversation here for easy reference? There is a lot going on and it's unclear to me what the issue is. Which python version is the user on?

atroyn avatar Jul 25 '24 17:07 atroyn