[Bug]: thread 'tokio-runtime-worker' panicked
What happened?
chromadb latest/1.0.6
Chroma panics randomly after a while, didn't happen on earlier versions, but our data amount also increased.
Versions
chroma 1.0.6 latest, python:3.11, host os is Ubuntu 24.04.2 LTS shouldn't be relevant, running as docker containers
Relevant log output
thread 'tokio-runtime-worker' panicked at /chroma/rust/system/src/wrapped_message.rs:88:30:
message reply channel was unexpectedly dropped by caller: Ok(())
stack backtrace:
0: rust_begin_unwind
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5
1: core::panicking::panic_fmt
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14
2: core::result::unwrap_failed
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:1679:5
3: core::result::Result<T,E>::expect
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:1059:23
4: <core::option::Option<chroma_system::wrapped_message::HandleableMessageImpl<M,<C as chroma_system::types::Handler<M>>::Result>> as chroma_system::wrapped_message::HandleableMessage<C>>::handle_and_reply::{{closure}}
at ./chroma/rust/system/src/wrapped_message.rs:86:25
5: <core::pin::Pin<P> as core::future::future::Future>::poll
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/future/future.rs:123:9
6: chroma_system::wrapped_message::WrappedMessage<C>::handle::{{closure}}
at ./chroma/rust/system/src/wrapped_message.rs:55:61
7: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
8: chroma_system::executor::ComponentExecutor<C>::run::{{closure}}
at ./chroma/rust/system/src/executor.rs:98:64
9: chroma_system::system::System::start_component::{{closure}}
at ./chroma/rust/system/src/system.rs:54:65
10: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
11: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/core.rs:331:17
12: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/loom/std/unsafe_cell.rs:16:9
13: tokio::runtime::task::core::Core<T,S>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/core.rs:320:30
14: tokio::runtime::task::harness::poll_future::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:499:19
15: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panic/unwind_safe.rs:272:9
16: std::panicking::try::do_call
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:557:40
17: std::panicking::try
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:521:19
18: std::panic::catch_unwind
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panic.rs:350:14
19: tokio::runtime::task::harness::poll_future
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:487:18
20: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:209:27
21: tokio::runtime::task::harness::Harness<T,S>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:154:15
22: tokio::runtime::task::raw::RawTask::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/raw.rs:201:18
23: tokio::runtime::task::LocalNotified<S>::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/mod.rs:435:9
24: tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:596:18
25: tokio::runtime::coop::with_budget
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/coop.rs:107:5
26: tokio::runtime::coop::budget
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/coop.rs:73:5
27: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:595:9
28: tokio::runtime::scheduler::multi_thread::worker::Context::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:546:24
29: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:511:21
30: tokio::runtime::context::scoped::Scoped<T>::set
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context/scoped.rs:40:9
31: tokio::runtime::context::set_scheduler::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context.rs:180:26
32: std::thread::local::LocalKey<T>::try_with
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/thread/local.rs:283:12
33: std::thread::local::LocalKey<T>::with
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/thread/local.rs:260:9
34: tokio::runtime::context::set_scheduler
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context.rs:180:17
35: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:506:9
36: tokio::runtime::context::runtime::enter_runtime
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context/runtime.rs:65:16
37: tokio::runtime::scheduler::multi_thread::worker::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:498:5
38: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/worker.rs:464:45
39: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/blocking/task.rs:42:21
40: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/core.rs:331:17
41: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/loom/std/unsafe_cell.rs:16:9
42: tokio::runtime::task::core::Core<T,S>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/core.rs:320:30
43: tokio::runtime::task::harness::poll_future::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:499:19
44: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panic/unwind_safe.rs:272:9
45: std::panicking::try::do_call
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:557:40
46: std::panicking::try
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:521:19
47: std::panic::catch_unwind
at ./rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panic.rs:350:14
48: tokio::runtime::task::harness::poll_future
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:487:18
49: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:209:27
50: tokio::runtime::task::harness::Harness<T,S>::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/harness.rs:154:15
51: tokio::runtime::task::raw::RawTask::poll
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/raw.rs:201:18
52: tokio::runtime::task::UnownedTask<S>::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/task/mod.rs:472:9
53: tokio::runtime::blocking::pool::Task::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/blocking/pool.rs:161:9
54: tokio::runtime::blocking::pool::Inner::run
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/blocking/pool.rs:511:17
55: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
at ./usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/blocking/pool.rs:469:13
@dertruffel, can you share during what type of operations your Chroma instance has panicked?
90% of operations is retrieval from the base, although i don't log that kind of data. Is there a way to auto-restart upon panic? @tazarov Additionally i upped the version to 1.0.7.dev17 and it still happens, same trace
@dertruffel Can you share some of your code? What kind of client are you using? More information will be very helpful in order to debug this
@itaismith
CHROMA_PORT = os.environ.get("CHROMA_PORT", "8000")
CHROMA_URL = f"http://{CHROMA_HOST}:{CHROMA_PORT}"
CHROMA_CLIENT = HttpClient(host=CHROMA_URL)```
def list_files_in_collection(collection_name):
try:
collection = CHROMA_CLIENT.get_or_create_collection(name=collection_name)
results = collection.get(include=["metadatas"])
metadatas = results["metadatas"]
file_paths = [meta.get("file_path") for meta in metadatas]
return list(set(file_paths))
except Exception as ex:
print("CHROMA LIST ISSUE", ex)
return list(set([]))
def get_gemini_embedding(query):
try:
gemini.configure(api_key=GEMINI_KEY)
embed_model = 'models/text-embedding-004'
embedding_resp = gemini.embed_content(model=embed_model, content=query)
return embedding_resp["embedding"]
except Exception as ex:
print("CHROMA GEMINI ISSUE", ex)
return None
def query_chroma(query, collection_name, top_k=3):
try:
# query_embedding = embed_with_gemini(query)
query_embedding = get_gemini_embedding(query)
collection = CHROMA_CLIENT.get_or_create_collection(name=collection_name)
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k,
include = ["documents", "metadatas", "distances"]
)
top_chunks = [(doc, meta) for doc, meta in zip(results["documents"][0], results["metadatas"][0])]
return top_chunks
except Exception as ex:
print("CHROMA QUERY ISSUE", ex)
return []
it panics randomly on write and read, but also when docker is pulling a new build so every time i start a new django container, i need to reboot chroma, running chromadb/chroma on docker, version 1.0.5, 1.0.6, 1.0.7.devs all panic
I got exactly the same error. If there is anything I can do to help, please let me know.
@AccsoAndreBuesgen until chroma devs fix the issue, the only band-aid solution i found is restarting the container whenever it panics, by monitoring the logs and on panic restarting the container. Not ideal
Just a bump here, I just upgraded from 0.6.2 to 1.0.12 and I've has this already 3x today whilst testing after upgrading in my dev environment, just searching to see if there is a resolution and found this thread. This is chroma running in a docker container on a mac.
Connect to Chroma at: [http://localhost:8000](http://localhost:8000/)
Getting started guide: [https://docs.trychroma.com/docs/overview/getting-started](https://docs.trychroma.com/docs/overview/getting-started)
OpenTelemetry is not enabled because it is missing from the config
Listening on 0.0.0.0:8000
thread 'tokio-runtime-worker' panicked at /chroma/rust/system/src/wrapped_message.rs:88:30:
message reply channel was unexpectedly dropped by caller: Ok(())
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Having to restart the container to resolve.
katewilkins@kate-wilkins-L37L4QFWLN backend % go run cmd/utils/collection_details/main.go
Fetching collection details...
Collection ID Embedding Size EF Construction EF Search HNSW Space
--------- -- -------------- -------------- --------- ----------
address 83b90a1e-10fb-4f7b-a5be-8f7fd68ff354 768 500 300 ip
email 9b812b3d-5888-40f9-997b-336a7b9b93c5 768 500 300 ip
location dc61e956-b0e9-418a-ae36-1eab6d20664c 3 500 300 ip
name 996f4314-3a00-47bc-b9f0-13c0b4c23a54 768 500 300 ip
katewilkins@kate-wilkins-L37L4QFWLN backend % go run cmd/utils/collection_sizes/main.go
Fetching collection sizes...
Collection Size
--------- ----
address 1813
email 4602
location 5079
name 3684
Total collections: 4
Total items: 15178
Its a little worrying as I've been forced to migrate from 0.6.2 due to 3 index corruptions in 3 weeks in production where those collections are in the order of 6-18M rows each, I'm a bit worried I'm swapping one issue for another.
Second this.
BUMP
I can now reproduce this issue. I'm accessing chroma via the v2 apis and hammering them adding to a collection to do a rebuild, if I kill my rebuild process midway through a call to to chroma it will trigger this issue and the only solution is a complete restart of the chroma docker container. Happens every time its in a call and I kill it.
IMHO this is a very serious bug in the handling of threads, see this link:
https://users.rust-lang.org/t/tokio-runtime-panics-at-shutdown/93651/4
Thanks for reporting, we're putting priority behind this on our end. Will post here when we have an update.