chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[ENH]: (Rust client): add EF config

Open codetheweb opened this issue 1 month ago • 2 comments

Description of changes

Main changes:

  • There are now separate sparse and dense EF traits.
  • EF traits have a config GAT which must implement TryInto<EmbeddingFunctionConfiguration>.

This does not introduce any machinery to automatically persist/hydrate EFs--currently, users must call get_config() / build_from_config() themselves (or .try_into() on the config). We can add that as a follow-up later but it's pretty non-trivial to build:

  • must implement a registry system to map EF names to implementations, allowing third-party crates to register custom EFs
  • need to remove GATs or type erase EFs so we can store refs to generic EFs during hydration
  • signatures of methods like query() will have to change which is a significant breaking change

Test plan

How are these changes tested?

  • [x] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

codetheweb avatar Nov 04 '25 22:11 codetheweb

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • [ ] If appropriate, are there adequate property based tests?
  • [ ] If appropriate, are there adequate unit tests?
  • [ ] Should any logging, debugging, tracing information be added or removed?
  • [ ] Are error messages user-friendly?
  • [ ] Have all documentation changes needed been made?
  • [ ] Have all non-obvious changes been commented?

System Compatibility

  • [ ] Are there any potential impacts on other parts of the system or backward compatibility?
  • [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

github-actions[bot] avatar Nov 04 '25 22:11 github-actions[bot]

Embed-Function Config GATs, Dense/Sparse Traits & Collection Helpers

Introduces explicit DenseEmbeddingFunction and SparseEmbeddingFunction traits with a config GAT, enabling round-trip serialise / deserialise of embedding-function parameters. Built-in BM25 and Ollama implementations are updated to comply, including config structs, TryFromTryInto<EmbeddingFunctionConfiguration> bridges, and helper builders. Supporting refactors touch collection/client helpers, schema validation and several utility impls (Key: AsRef<str>).

Key Changes

• Split generic EmbeddingFunction into DenseEmbeddingFunction and SparseEmbeddingFunction in rust/chroma/src/embed/mod.rs with new required fns build_from_config, get_config, get_name. • Added config structs BM25Config and OllamaEmbeddingFunctionConfig + TryFromEmbeddingFunctionConfiguration implementations for persistence. • Refactored BM25SparseEmbeddingFunction and OllamaEmbeddingFunction to GAT-based config, added error variants for (de)serialisation, simplified encode path (BM25 now infallible). • Client/collection QoL: centralised ChromaCollection::new, replaced ad-hoc struct literal construction; ChromaHttpClient list/create now use that helper. • Execution layer: added duplicate impl AsRef<str> for Key (and used in frontend key validation). • Frontend fix: schema validation now calls key.as_ref() instead of allocating to_string(). • Misc clean-ups: hard-coded timeouts noted, TODOs added, minor doc / test updates.

Affected Areas

• rust/chroma/src/embed/* (traits + implementations) • rust/chroma/src/client/chroma_http_client.rs • rust/chroma/src/collection.rs • rust/types/src/execution/operator.rs • rust/frontend/src/impls/service_based_frontend.rs

This summary was automatically generated by @propel-code-bot

propel-code-bot[bot] avatar Nov 05 '25 02:11 propel-code-bot[bot]

Closing this out as it was not landed. Can pick up as needed.

HammadB avatar Nov 19 '25 04:11 HammadB