Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

Generating captions for a large dataset of images.
Localizing objects or regions in a batch of images based on textual descriptions.
Classifying a large number of images into predefined categories, considering accompanying text information.
Answering questions based on a batch of images (single and multiple question prompts).
Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

Accept a batch of images as input, provided as a list or array of image objects.
Accept a batch of text prompts as input, provided as a list or array of strings.
Accept a single text prompt as input, provided as a string.

Perform batch processing:

Process the batch of images and text prompts simultaneously (async) using the MLX-VLM model.
Utilize parallel processing or GPU acceleration to optimize batch processing performance.
Ensure that the processing of one input in the batch does not affect the processing of other inputs.

Generate batched outputs:

Return the generated outputs for each input in the batch.
Maintain the order of the outputs corresponding to the order of the inputs.
Support different output formats such as text, embeddings, or visual representations based on the specific task.

Error handling:

Handle errors gracefully during batch processing.
Provide informative error messages for invalid inputs or processing failures.
Continue processing the remaining inputs in the batch if an error occurs for a specific input.

API design:

Provide a clear and intuitive API for users to perform batch processing.
Allow users to specify the maximum batch size supported by their system.
Provide options to control the batch processing behavior, such as enabling/disabling parallel processing.

Documentation and examples:

Update the library documentation to include information about the batch processing feature.
Provide code examples demonstrating how to use the batch processing API effectively.
Include performance benchmarks and guidelines for optimal batch sizes based on system resources.

Implementation

Modify the existing input handling logic to accept batches of images and text prompts.
Implement batch processing functionality using parallel processing techniques or GPU acceleration libraries.
Optimize memory usage and performance for efficient batch processing.
Update the output generation logic to handle batched outputs and maintain the correct order.
Implement error handling mechanisms to gracefully handle and report errors during batch processing.
Design and expose a user-friendly API for performing batch processing.
Write unit tests to verify the correctness and performance of the batch processing implementation.
Update the library documentation and provide code examples for using the batch processing feature.

Testing

Prepare a comprehensive test suite to validate the batch processing functionality.
Test with different batch sizes and input variations to ensure robustness.
Verify that the generated outputs match the expected results for each input in the batch.
Measure the performance improvement gained by batch processing compared to individual processing.
Conduct error handling tests to ensure graceful handling of invalid inputs and processing failures.

Delivery

Integrate the batch processing feature into the existing MLX-VLM library codebase.
Ensure backward compatibility with previous versions of the library.
Provide release notes highlighting the new batch processing capability and any breaking changes.
Update the library version number following semantic versioning conventions.
Publish the updated library package to the relevant package repositories or distribution channels.

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

Jun 11 '24 12:06 Blaizzy

Will take it for implementation! hope to meet the standards :)

Jun 12 '24 12:06 eDeveloperOZ

Here are some extra details: @willccbb

Jul 03 '24 09:07 Blaizzy

Sorry, just saw this -- will take a swing when #53 is merged.

Jul 25 '24 20:07 willccbb

@willccbb done ✅

#53 is merged

Jul 26 '24 13:07 Blaizzy

Hey @willccbb, any update on this? Would be super helpful to have

Nov 15 '24 21:11 Benjoyo

@willccbb doesn't have the bandwidth.

This feature is now open and back in backlog.

Nov 16 '24 02:11 Blaizzy

Hi @Blaizzy, I'm interested in implementing the batch processing feature for MLX-VLM.

After reviewing the issue requirements and the existing codebase, I understand this involves:

Implementing a BatchedKVCache to enable parallel inference
Adding batch_generate utilities for efficient parallel processing
Creating new API endpoints for batch operations
Updating model architectures to handle batched inputs

The PR #53 that refactored KVCache implementation provides a good foundation for this work.

I plan to implement this in stages:

First, the core BatchedKVCache class
Then batch processing utilities
API endpoints for batch operations
Finally, updating model architectures to use batched operations

My implementation will include configurable parameters for batch generation with sensible defaults:

Default batch size with options to configure larger sizes for high-memory systems
Memory management options to balance efficiency vs throughput
Configuration flags to optimize for different hardware capabilities

Would you be open to my contribution?

Looking forward to your feedback!

May 21 '25 14:05 rpj09

Hi! Any updates on this? :)

Nov 10 '25 14:11 Isaac4real

Batch Processing Feature

Overview

Requirements

Support batched inputs:

Perform batch processing:

Generate batched outputs:

Error handling:

API design:

Documentation and examples:

Implementation

Testing

Delivery