feat: Add configurable batch_size and max_workers to embed method
Add configurable batch_size and max_workers to embed method
Summary
This PR fixes #534 by making the embed batch size configurable through optional parameters, giving users control over batching behavior based on their specific needs.
Problem
Previously, the embed() method used a fixed batch size of 96 (from config.embed_batch_size), which could be suboptimal for various use cases:
- Users with memory constraints needed smaller batches
- Users with high-throughput needs wanted larger batches
- Rate-limited applications needed to control concurrency
Solution
Added two optional parameters to the embed() method:
batch_size: Optional[int] = None- Controls the number of texts per batchmax_workers: Optional[int] = None- Controls ThreadPoolExecutor concurrency (sync client only)
Implementation Details
Changes to src/cohere/client.py:
def embed(
self,
*,
texts: Optional[Sequence[str]] = OMIT,
# ... existing parameters ...
batch_size: Optional[int] = None, # NEW
max_workers: Optional[int] = None, # NEW
) -> EmbedResponse:
The implementation:
- Uses provided
batch_sizeor falls back to the defaultembed_batch_size(96) - Creates a temporary ThreadPoolExecutor if
max_workersis specified - Maintains full backward compatibility - existing code continues to work unchanged
Testing
All tests pass:
$ python -m pytest tests/test_configurable_batch_size.py -v
============================= test session starts ==============================
collected 6 items
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_batch_size_edge_cases PASSED [ 16%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_batch_size PASSED [ 33%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_max_workers PASSED [ 50%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_default_batch_size PASSED [ 66%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_no_batching_ignores_parameters PASSED [ 83%]
tests/test_configurable_batch_size.py::TestAsyncConfigurableBatchSize::test_async_custom_batch_size PASSED [100%]
============================== 6 passed in 0.40s ===============================
Test coverage includes:
- ✅ Custom batch sizes work correctly
- ✅ Default batch size (96) is used when parameter not specified
- ✅ Edge cases: batch_size=1, batch_size > total texts
- ✅ Custom max_workers creates new ThreadPoolExecutor
- ✅ Parameters are properly ignored when batching=False
- ✅ Async client batch_size support
Code Quality
- ✅ Ruff linting passes
- ✅ Mypy type checking passes
- ✅ Import ordering fixed automatically by ruff
Usage Examples
Default behavior (unchanged):
response = client.embed(texts=texts, model="embed-english-v3.0")
# Uses default batch_size=96
Custom batch size for memory optimization:
response = client.embed(
texts=texts,
model="embed-english-v3.0",
batch_size=10 # Smaller batches for memory-constrained environments
)
Rate limiting with reduced concurrency:
response = client.embed(
texts=texts,
model="embed-english-v3.0",
batch_size=20,
max_workers=2 # Only 2 concurrent API calls
)
Benefits
- Memory optimization: Users can reduce batch size to limit memory usage
- Performance tuning: Users can increase batch size for fewer API calls
- Rate limit handling: Control concurrency with max_workers
- Backward compatible: No changes required to existing code
- Complements PR #698: Works well with the memory-efficient
embed_stream()method
This implementation provides the flexibility requested in issue #534 while maintaining the SDK's ease of use and backward compatibility.
Context: How this PR relates to #536 and issue #534
I noticed that PR #536 was already merged, which partially addressed issue #534 by adding configuration to the Client constructor. After analyzing both implementations, I believe this PR (#699) is still valuable as it complements #536 by addressing the remaining requirements from issue #534.
What PR #536 provided:
- Client-level ThreadPoolExecutor configuration via constructor
- Example:
client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))
What this PR adds:
- Configurable batch_size - The other key request from issue #534 that wasn't addressed
- Per-call flexibility - Configure batch_size and max_workers for individual embed() calls
- Dynamic optimization - Adjust parameters based on document characteristics without recreating the client
Key differences:
| Aspect | PR #536 | This PR (#699) |
|---|---|---|
| Configuration level | Client-wide | Per-method call |
| Parameters | thread_pool_executor (constructor) | batch_size, max_workers (embed method) |
| Use case | Set once for all operations | Dynamic per-operation tuning |
| Batch size control | ❌ | ✅ |
Example usage showing both PRs working together:
# PR #536 - Set default thread pool for client
client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))
# PR #699 - Override for specific operations
# Small documents: smaller batches, more workers
response = client.embed(texts=small_docs, batch_size=10, max_workers=64)
# Large documents: larger batches, fewer workers
response = client.embed(texts=large_docs, batch_size=50, max_workers=8)
# Memory constrained: very small batches
response = client.embed(texts=texts, batch_size=5)
This implementation completes the solution for issue #534 by providing both the batch size configuration and per-call flexibility that users requested for optimizing their embedding workflows.
🔄 PR Updated - Rebased on Latest Main
This PR has been rebased on the latest main branch and is ready for review.
Changes:
- ✅ Rebased on upstream/main (no conflicts)
- ✅ All 6 tests passing
- ✅ Ruff linting passes
- ✅ Mypy type checking passes
Requesting Review: @mkozakov @MusaTalluzi-cohere @andrewbcohere @daniel-cohere
This PR fixes issue #534 by adding configurable batch_size and max_workers parameters to the embed() method, giving users control over batching behavior based on their specific needs.
Key Features:
- Configurable batch size for memory optimization
- Configurable max_workers for rate limit handling
- Fully backward compatible (no breaking changes)
- Complements PR #698's streaming approach
Would appreciate your review when you have a chance!
Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋
I hope you're all doing well! I wanted to gently follow up on this PR that adds configurable batch sizing and concurrency control to the embed() method.
Why this matters: This addresses issue #534 and gives users fine-grained control over embedding batch operations, which is crucial for:
- Memory-constrained environments (smaller batches)
- High-throughput applications (larger batches)
- Rate-limited scenarios (controlled concurrency)
What's been validated:
- ✅ All 6 unit tests passing (custom batch sizes, edge cases, async support)
- ✅ Ruff linting and Mypy type checking passed
- ✅ No merge conflicts - ready to merge
- ✅ Fully backward compatible (defaults to existing behavior)
- ✅ Complements PR #698's streaming functionality
Implementation:
Simple, clean addition of two optional parameters (batch_size and max_workers) that default to existing behavior when not specified.
Would you have a chance to review this when convenient? I'm happy to address any feedback or make adjustments!
Thanks so much for maintaining this excellent SDK! 🙏
Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋
Dudes come on!
Hi Federico, thank you for this PR and sorry for the delay, we have been a bit busy but will try to review it soon.
Hey @andrewbcohere, no worries at all - totally understand! Just rebased onto the latest main (now includes SDK regeneration through Nov 10th). All unit tests pass. The PR is ready for review whenever you get a chance. Really appreciate you taking the time to look at this!