chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[ENH] make batch util more ergonomic and document it

Open codetheweb opened this issue 1 year ago • 2 comments

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • allow create_batches() to work for all mutations (e.g. batching deletions)
    • document create_batches() and add note to troubleshooting docs

Test plan

How are these changes tested?

Covered by existing create_batches test.

Documentation Changes

Added a new docstring as well as a new section to the troubleshooting docs.

See https://github.com/chroma-core/chroma/issues/2282.

codetheweb avatar Jul 31 '24 16:07 codetheweb

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • [ ] If appropriate, are there adequate property based tests?
  • [ ] If appropriate, are there adequate unit tests?
  • [ ] Should any logging, debugging, tracing information be added or removed?
  • [ ] Are error messages user-friendly?
  • [ ] Have all documentation changes needed been made?
  • [ ] Have all non-obvious changes been commented?

System Compatibility

  • [ ] Are there any potential impacts on other parts of the system or backward compatibility?
  • [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

github-actions[bot] avatar Jul 31 '24 16:07 github-actions[bot]

Ignore me, I was skimming

On Wed, Jul 31, 2024 at 5:39 PM Max Isom @.***> wrote:

@.**** commented on this pull request.

In chromadb/utils/batch_utils.py https://github.com/chroma-core/chroma/pull/2604#discussion_r1699246885:

  •            (  # type: ignore
    
  •                ids[i : i + api.get_max_batch_size()],
    
  •                embeddings[i : i + api.get_max_batch_size()]
    
  •                if embeddings
    
  •                else None,
    
  •                metadatas[i : i + api.get_max_batch_size()] if metadatas else None,
    
  •                documents[i : i + api.get_max_batch_size()] if documents else None,
    
  •            )
    
  •        )
    
  • else:
  •    _batches.append((ids, embeddings, metadatas, documents))  # type: ignore
    
  • return _batches
  • client: ClientAPI,
  • records: T,
  • max_batch_size: Optional[int] = 1024,
  • print_progress_description: Optional[str] = None,

sorry, not sure what you mean, this implementation includes a progress bar using the rich module rich works in notebooks https://rich.readthedocs.io/en/stable/progress.html

— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2604#discussion_r1699246885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32PUMYTBRVUEZJLIT7TZPF7U5AVCNFSM6AAAAABLY3YDWSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMJRGM2DSMJTGQ . You are receiving this because you commented.Message ID: @.***>

HammadB avatar Aug 01 '24 00:08 HammadB

  • #2669 Graphite
  • #2663 Graphite
  • #2604 Graphite 👈
  • main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @codetheweb and the rest of your teammates on Graphite Graphite

codetheweb avatar Aug 15 '24 17:08 codetheweb