mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

Batch Processing Feature

Open Blaizzy opened this issue 1 year ago • 8 comments

Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

  1. Generating captions for a large dataset of images.
  2. Localizing objects or regions in a batch of images based on textual descriptions.
  3. Classifying a large number of images into predefined categories, considering accompanying text information.
  4. Answering questions based on a batch of images (single and multiple question prompts).
  5. Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

  • Accept a batch of images as input, provided as a list or array of image objects.
  • Accept a batch of text prompts as input, provided as a list or array of strings.
  • Accept a single text prompt as input, provided as a string.

Perform batch processing:

  • Process the batch of images and text prompts simultaneously (async) using the MLX-VLM model.
  • Utilize parallel processing or GPU acceleration to optimize batch processing performance.
  • Ensure that the processing of one input in the batch does not affect the processing of other inputs.

Generate batched outputs:

  • Return the generated outputs for each input in the batch.
  • Maintain the order of the outputs corresponding to the order of the inputs.
  • Support different output formats such as text, embeddings, or visual representations based on the specific task.

Error handling:

  • Handle errors gracefully during batch processing.
  • Provide informative error messages for invalid inputs or processing failures.
  • Continue processing the remaining inputs in the batch if an error occurs for a specific input.

API design:

  • Provide a clear and intuitive API for users to perform batch processing.
  • Allow users to specify the maximum batch size supported by their system.
  • Provide options to control the batch processing behavior, such as enabling/disabling parallel processing.

Documentation and examples:

  • Update the library documentation to include information about the batch processing feature.
  • Provide code examples demonstrating how to use the batch processing API effectively.
  • Include performance benchmarks and guidelines for optimal batch sizes based on system resources.

Implementation

  • Modify the existing input handling logic to accept batches of images and text prompts.
  • Implement batch processing functionality using parallel processing techniques or GPU acceleration libraries.
  • Optimize memory usage and performance for efficient batch processing.
  • Update the output generation logic to handle batched outputs and maintain the correct order.
  • Implement error handling mechanisms to gracefully handle and report errors during batch processing.
  • Design and expose a user-friendly API for performing batch processing.
  • Write unit tests to verify the correctness and performance of the batch processing implementation.
  • Update the library documentation and provide code examples for using the batch processing feature.

Testing

  • Prepare a comprehensive test suite to validate the batch processing functionality.
  • Test with different batch sizes and input variations to ensure robustness.
  • Verify that the generated outputs match the expected results for each input in the batch.
  • Measure the performance improvement gained by batch processing compared to individual processing.
  • Conduct error handling tests to ensure graceful handling of invalid inputs and processing failures.

Delivery

  • Integrate the batch processing feature into the existing MLX-VLM library codebase.
  • Ensure backward compatibility with previous versions of the library.
  • Provide release notes highlighting the new batch processing capability and any breaking changes.
  • Update the library version number following semantic versioning conventions.
  • Publish the updated library package to the relevant package repositories or distribution channels.

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

Blaizzy avatar Jun 11 '24 12:06 Blaizzy

Will take it for implementation! hope to meet the standards :)

eDeveloperOZ avatar Jun 12 '24 12:06 eDeveloperOZ

Here are some extra details: @willccbb

Blaizzy avatar Jul 03 '24 09:07 Blaizzy

Sorry, just saw this -- will take a swing when #53 is merged.

willccbb avatar Jul 25 '24 20:07 willccbb

@willccbb done ✅

#53 is merged

Blaizzy avatar Jul 26 '24 13:07 Blaizzy

Hey @willccbb, any update on this? Would be super helpful to have

Benjoyo avatar Nov 15 '24 21:11 Benjoyo

@willccbb doesn't have the bandwidth.

This feature is now open and back in backlog.

Blaizzy avatar Nov 16 '24 02:11 Blaizzy

Hi @Blaizzy, I'm interested in implementing the batch processing feature for MLX-VLM.

After reviewing the issue requirements and the existing codebase, I understand this involves:

  1. Implementing a BatchedKVCache to enable parallel inference
  2. Adding batch_generate utilities for efficient parallel processing
  3. Creating new API endpoints for batch operations
  4. Updating model architectures to handle batched inputs

The PR #53 that refactored KVCache implementation provides a good foundation for this work.

I plan to implement this in stages:

  • First, the core BatchedKVCache class
  • Then batch processing utilities
  • API endpoints for batch operations
  • Finally, updating model architectures to use batched operations

My implementation will include configurable parameters for batch generation with sensible defaults:

  • Default batch size with options to configure larger sizes for high-memory systems
  • Memory management options to balance efficiency vs throughput
  • Configuration flags to optimize for different hardware capabilities

Would you be open to my contribution?

Looking forward to your feedback!

rpj09 avatar May 21 '25 14:05 rpj09

Hi! Any updates on this? :)

Isaac4real avatar Nov 10 '25 14:11 Isaac4real