cohere-python icon indicating copy to clipboard operation
cohere-python copied to clipboard

feat: Add configurable batch_size and max_workers to embed method

Open fede-kamel opened this issue 2 months ago • 5 comments

Add configurable batch_size and max_workers to embed method

Summary

This PR fixes #534 by making the embed batch size configurable through optional parameters, giving users control over batching behavior based on their specific needs.

Problem

Previously, the embed() method used a fixed batch size of 96 (from config.embed_batch_size), which could be suboptimal for various use cases:

  • Users with memory constraints needed smaller batches
  • Users with high-throughput needs wanted larger batches
  • Rate-limited applications needed to control concurrency

Solution

Added two optional parameters to the embed() method:

  • batch_size: Optional[int] = None - Controls the number of texts per batch
  • max_workers: Optional[int] = None - Controls ThreadPoolExecutor concurrency (sync client only)

Implementation Details

Changes to src/cohere/client.py:

def embed(
    self,
    *,
    texts: Optional[Sequence[str]] = OMIT,
    # ... existing parameters ...
    batch_size: Optional[int] = None,  # NEW
    max_workers: Optional[int] = None,  # NEW
) -> EmbedResponse:

The implementation:

  1. Uses provided batch_size or falls back to the default embed_batch_size (96)
  2. Creates a temporary ThreadPoolExecutor if max_workers is specified
  3. Maintains full backward compatibility - existing code continues to work unchanged

Testing

All tests pass:

$ python -m pytest tests/test_configurable_batch_size.py -v
============================= test session starts ==============================
collected 6 items

tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_batch_size_edge_cases PASSED [ 16%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_batch_size PASSED [ 33%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_custom_max_workers PASSED [ 50%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_default_batch_size PASSED [ 66%]
tests/test_configurable_batch_size.py::TestConfigurableBatchSize::test_no_batching_ignores_parameters PASSED [ 83%]
tests/test_configurable_batch_size.py::TestAsyncConfigurableBatchSize::test_async_custom_batch_size PASSED [100%]

============================== 6 passed in 0.40s ===============================

Test coverage includes:

  • ✅ Custom batch sizes work correctly
  • ✅ Default batch size (96) is used when parameter not specified
  • ✅ Edge cases: batch_size=1, batch_size > total texts
  • ✅ Custom max_workers creates new ThreadPoolExecutor
  • ✅ Parameters are properly ignored when batching=False
  • ✅ Async client batch_size support

Code Quality

  • ✅ Ruff linting passes
  • ✅ Mypy type checking passes
  • ✅ Import ordering fixed automatically by ruff

Usage Examples

Default behavior (unchanged):

response = client.embed(texts=texts, model="embed-english-v3.0")
# Uses default batch_size=96

Custom batch size for memory optimization:

response = client.embed(
    texts=texts,
    model="embed-english-v3.0", 
    batch_size=10  # Smaller batches for memory-constrained environments
)

Rate limiting with reduced concurrency:

response = client.embed(
    texts=texts,
    model="embed-english-v3.0",
    batch_size=20,
    max_workers=2  # Only 2 concurrent API calls
)

Benefits

  1. Memory optimization: Users can reduce batch size to limit memory usage
  2. Performance tuning: Users can increase batch size for fewer API calls
  3. Rate limit handling: Control concurrency with max_workers
  4. Backward compatible: No changes required to existing code
  5. Complements PR #698: Works well with the memory-efficient embed_stream() method

This implementation provides the flexibility requested in issue #534 while maintaining the SDK's ease of use and backward compatibility.

fede-kamel avatar Sep 24 '25 20:09 fede-kamel

Context: How this PR relates to #536 and issue #534

I noticed that PR #536 was already merged, which partially addressed issue #534 by adding configuration to the Client constructor. After analyzing both implementations, I believe this PR (#699) is still valuable as it complements #536 by addressing the remaining requirements from issue #534.

What PR #536 provided:

  • Client-level ThreadPoolExecutor configuration via constructor
  • Example: client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))

What this PR adds:

  1. Configurable batch_size - The other key request from issue #534 that wasn't addressed
  2. Per-call flexibility - Configure batch_size and max_workers for individual embed() calls
  3. Dynamic optimization - Adjust parameters based on document characteristics without recreating the client

Key differences:

Aspect PR #536 This PR (#699)
Configuration level Client-wide Per-method call
Parameters thread_pool_executor (constructor) batch_size, max_workers (embed method)
Use case Set once for all operations Dynamic per-operation tuning
Batch size control

Example usage showing both PRs working together:

# PR #536 - Set default thread pool for client
client = cohere.Client(thread_pool_executor=ThreadPoolExecutor(32))

# PR #699 - Override for specific operations
# Small documents: smaller batches, more workers
response = client.embed(texts=small_docs, batch_size=10, max_workers=64)

# Large documents: larger batches, fewer workers  
response = client.embed(texts=large_docs, batch_size=50, max_workers=8)

# Memory constrained: very small batches
response = client.embed(texts=texts, batch_size=5)

This implementation completes the solution for issue #534 by providing both the batch size configuration and per-call flexibility that users requested for optimizing their embedding workflows.

fede-kamel avatar Sep 24 '25 20:09 fede-kamel

🔄 PR Updated - Rebased on Latest Main

This PR has been rebased on the latest main branch and is ready for review.

Changes:

  • ✅ Rebased on upstream/main (no conflicts)
  • ✅ All 6 tests passing
  • ✅ Ruff linting passes
  • ✅ Mypy type checking passes

Requesting Review: @mkozakov @MusaTalluzi-cohere @andrewbcohere @daniel-cohere

This PR fixes issue #534 by adding configurable batch_size and max_workers parameters to the embed() method, giving users control over batching behavior based on their specific needs.

Key Features:

  • Configurable batch size for memory optimization
  • Configurable max_workers for rate limit handling
  • Fully backward compatible (no breaking changes)
  • Complements PR #698's streaming approach

Would appreciate your review when you have a chance!

fede-kamel avatar Oct 28 '25 15:10 fede-kamel

Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋

I hope you're all doing well! I wanted to gently follow up on this PR that adds configurable batch sizing and concurrency control to the embed() method.

Why this matters: This addresses issue #534 and gives users fine-grained control over embedding batch operations, which is crucial for:

  • Memory-constrained environments (smaller batches)
  • High-throughput applications (larger batches)
  • Rate-limited scenarios (controlled concurrency)

What's been validated:

  • ✅ All 6 unit tests passing (custom batch sizes, edge cases, async support)
  • ✅ Ruff linting and Mypy type checking passed
  • No merge conflicts - ready to merge
  • ✅ Fully backward compatible (defaults to existing behavior)
  • ✅ Complements PR #698's streaming functionality

Implementation: Simple, clean addition of two optional parameters (batch_size and max_workers) that default to existing behavior when not specified.

Would you have a chance to review this when convenient? I'm happy to address any feedback or make adjustments!

Thanks so much for maintaining this excellent SDK! 🙏

fede-kamel avatar Nov 12 '25 00:11 fede-kamel

Hi @mkozakov, @billytrend-cohere, @daniel-cohere! 👋

Dudes come on!

fede-kamel avatar Nov 19 '25 23:11 fede-kamel

Hi Federico, thank you for this PR and sorry for the delay, we have been a bit busy but will try to review it soon.

andrewbcohere avatar Nov 20 '25 21:11 andrewbcohere

Hey @andrewbcohere, no worries at all - totally understand! Just rebased onto the latest main (now includes SDK regeneration through Nov 10th). All unit tests pass. The PR is ready for review whenever you get a chance. Really appreciate you taking the time to look at this!

fede-kamel avatar Nov 26 '25 13:11 fede-kamel