Add configurable batch_size and max_workers to embed method

Summary

This PR fixes #534 by making the embed batch size configurable through optional parameters, giving users control over batching behavior based on their specific needs.

Problem

Previously, the embed() method used a fixed batch size of 96 (from config.embed_batch_size), which could be suboptimal for various use cases:

Users with memory constraints needed smaller batches
Users with high-throughput needs wanted larger batches
Rate-limited applications needed to control concurrency

Solution

Added two optional parameters to the embed() method:

batch_size: Optional[int] = None - Controls the number of texts per batch
max_workers: Optional[int] = None - Controls ThreadPoolExecutor concurrency (sync client only)

Implementation Details

Changes to `src/cohere/client.py`:

def embed(
    self,
    *,
    texts: Optional[Sequence[str]] = OMIT,
    # ... existing parameters ...
    batch_size: Optional[int] = None,  # NEW
    max_workers: Optional[int] = None,  # NEW
) -> EmbedResponse:

The implementation:

Uses provided batch_size or falls back to the default embed_batch_size (96)
Creates a temporary ThreadPoolExecutor if max_workers is specified
Maintains full backward compatibility - existing code continues to work unchanged

Testing

All tests pass:

$ python -m pytest tests/test_configurable_batch_size.py -v
============================= test session starts ==============================
collected 6 items

tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_batch_size_edge_cases PASSED [ 16%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_batch_size PASSED [ 33%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_max_workers PASSED [ 50%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_default_batch_size PASSED [ 66%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_no_batching_ignores_parameters PASSED [ 83%]
tests/test_configurable_batch_size.py::TestAsyncConfigurableBatchSize::test_async_custom_batch_size PASSED [100%]

============================== 6 passed in 0.40s ===============================

Test coverage includes:

✅ Custom batch sizes work correctly
✅ Default batch size (96) is used when parameter not specified
✅ Edge cases: batch_size=1, batch_size > total texts
✅ Custom max_workers creates new ThreadPoolExecutor
✅ Parameters are properly ignored when batching=False
✅ Async client batch_size support

Code Quality

✅ Ruff linting passes
✅ Mypy type checking passes
✅ Import ordering fixed automatically by ruff

Usage Examples

Default behavior (unchanged):

response = client.embed(texts=texts, model="embed-english-v3.0")
# Uses default batch_size=96

Custom batch size for memory optimization:

response = client.embed(
    texts=texts,
    model="embed-english-v3.0", 
    batch_size=10  # Smaller batches for memory-constrained environments
)

Rate limiting with reduced concurrency:

response = client.embed(
    texts=texts,
    model="embed-english-v3.0",
    batch_size=20,
    max_workers=2  # Only 2 concurrent API calls
)

Benefits

Memory optimization: Users can reduce batch size to limit memory usage
Performance tuning: Users can increase batch size for fewer API calls
Rate limit handling: Control concurrency with max_workers
Backward compatible: No changes required to existing code
Complements PR #698: Works well with the memory-efficient embed_stream() method

This implementation provides the flexibility requested in issue #534 while maintaining the SDK's ease of use and backward compatibility.

Sep 24 '25 20:09 fede-kamel

Context: How this PR relates to #536 and issue #534

I noticed that PR #536 was already merged, which partially addressed issue #534 by adding configuration to the Client constructor. After analyzing both implementations, I believe this PR (#699) is still valuable as it complements #536 by addressing the remaining requirements from issue #534.

What PR #536 provided:

Client-level ThreadPoolExecutor configuration via constructor
Example: client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))

What this PR adds:

Configurable batch_size - The other key request from issue #534 that wasn't addressed
Per-call flexibility - Configure batch_size and max_workers for individual embed() calls
Dynamic optimization - Adjust parameters based on document characteristics without recreating the client

Key differences:

Aspect	PR #536	This PR (#699)
Configuration level	Client-wide	Per-method call
Parameters	thread_pool_executor (constructor)	batch_size, max_workers (embed method)
Use case	Set once for all operations	Dynamic per-operation tuning
Batch size control	❌	✅

Example usage showing both PRs working together:

# PR #536 - Set default thread pool for client
client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))

# PR #699 - Override for specific operations
# Small documents: smaller batches, more workers
response = client.embed(texts=small_docs, batch_size=10, max_workers=64)

# Large documents: larger batches, fewer workers  
response = client.embed(texts=large_docs, batch_size=50, max_workers=8)

# Memory constrained: very small batches
response = client.embed(texts=texts, batch_size=5)

This implementation completes the solution for issue #534 by providing both the batch size configuration and per-call flexibility that users requested for optimizing their embedding workflows.

Sep 24 '25 20:09 fede-kamel

🔄 PR Updated - Rebased on Latest Main

This PR has been rebased on the latest main branch and is ready for review.

Changes:

✅ Rebased on upstream/main (no conflicts)
✅ All 6 tests passing
✅ Ruff linting passes
✅ Mypy type checking passes

Requesting Review: @mkozakov @MusaTalluzi-cohere @andrewbcohere @daniel-cohere

This PR fixes issue #534 by adding configurable batch_size and max_workers parameters to the embed() method, giving users control over batching behavior based on their specific needs.

Key Features:

Configurable batch size for memory optimization
Configurable max_workers for rate limit handling
Fully backward compatible (no breaking changes)
Complements PR #698's streaming approach

Would appreciate your review when you have a chance!

Oct 28 '25 15:10 fede-kamel

Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋

I hope you're all doing well! I wanted to gently follow up on this PR that adds configurable batch sizing and concurrency control to the embed() method.

Why this matters: This addresses issue #534 and gives users fine-grained control over embedding batch operations, which is crucial for:

Memory-constrained environments (smaller batches)
High-throughput applications (larger batches)
Rate-limited scenarios (controlled concurrency)

What's been validated:

✅ All 6 unit tests passing (custom batch sizes, edge cases, async support)
✅ Ruff linting and Mypy type checking passed
✅ No merge conflicts - ready to merge
✅ Fully backward compatible (defaults to existing behavior)
✅ Complements PR #698's streaming functionality

Implementation: Simple, clean addition of two optional parameters (batch_size and max_workers) that default to existing behavior when not specified.

Would you have a chance to review this when convenient? I'm happy to address any feedback or make adjustments!

Thanks so much for maintaining this excellent SDK! 🙏

Nov 12 '25 00:11 fede-kamel

Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋

Dudes come on!

Nov 19 '25 23:11 fede-kamel

Hi Federico, thank you for this PR and sorry for the delay, we have been a bit busy but will try to review it soon.

Nov 20 '25 21:11 andrewbcohere

Hey @andrewbcohere, no worries at all - totally understand! Just rebased onto the latest main (now includes SDK regeneration through Nov 10th). All unit tests pass. The PR is ready for review whenever you get a chance. Really appreciate you taking the time to look at this!

Nov 26 '25 13:11 fede-kamel

feat: Add configurable batch_size and max_workers to embed method

Add configurable batch_size and max_workers to embed method

Summary

Problem

Solution

Implementation Details

Changes to src/cohere/client.py:

Testing

All tests pass:

Test coverage includes:

Code Quality

Usage Examples

Default behavior (unchanged):

Custom batch size for memory optimization:

Rate limiting with reduced concurrency:

Benefits

Context: How this PR relates to #536 and issue #534

What PR #536 provided:

What this PR adds:

Key differences:

Example usage showing both PRs working together:

🔄 PR Updated - Rebased on Latest Main

Changes to `src/cohere/client.py`: