chroma
chroma copied to clipboard
[ENH] make batch util more ergonomic and document it
Description of changes
Summarize the changes made by this PR.
- Improvements & Bug fixes
- allow
create_batches()to work for all mutations (e.g. batching deletions) - document
create_batches()and add note to troubleshooting docs
- allow
Test plan
How are these changes tested?
Covered by existing create_batches test.
Documentation Changes
Added a new docstring as well as a new section to the troubleshooting docs.
See https://github.com/chroma-core/chroma/issues/2282.
Reviewer Checklist
Please leverage this checklist to ensure your code review is thorough before approving
Testing, Bugs, Errors, Logs, Documentation
- [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
- [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
- [ ] If appropriate, are there adequate property based tests?
- [ ] If appropriate, are there adequate unit tests?
- [ ] Should any logging, debugging, tracing information be added or removed?
- [ ] Are error messages user-friendly?
- [ ] Have all documentation changes needed been made?
- [ ] Have all non-obvious changes been commented?
System Compatibility
- [ ] Are there any potential impacts on other parts of the system or backward compatibility?
- [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?
Quality
- [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)
Ignore me, I was skimming
On Wed, Jul 31, 2024 at 5:39 PM Max Isom @.***> wrote:
@.**** commented on this pull request.
In chromadb/utils/batch_utils.py https://github.com/chroma-core/chroma/pull/2604#discussion_r1699246885:
( # type: ignore
ids[i : i + api.get_max_batch_size()],embeddings[i : i + api.get_max_batch_size()]if embeddingselse None,metadatas[i : i + api.get_max_batch_size()] if metadatas else None,documents[i : i + api.get_max_batch_size()] if documents else None,))- else:
_batches.append((ids, embeddings, metadatas, documents)) # type: ignore- return _batches
- client: ClientAPI,
- records: T,
- max_batch_size: Optional[int] = 1024,
- print_progress_description: Optional[str] = None,
sorry, not sure what you mean, this implementation includes a progress bar using the rich module rich works in notebooks https://rich.readthedocs.io/en/stable/progress.html
— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2604#discussion_r1699246885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32PUMYTBRVUEZJLIT7TZPF7U5AVCNFSM6AAAAABLY3YDWSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMJRGM2DSMJTGQ . You are receiving this because you commented.Message ID: @.***>
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @codetheweb and the rest of your teammates on
Graphite