gemma icon indicating copy to clipboard operation
gemma copied to clipboard

feat: Add streaming support for batched prompts

Open XBastille opened this issue 2 months ago • 0 comments

Description

Adds streaming support for batched prompts, resolving issue #406.

Changes

  • Removed restriction that blocked batched streaming in _sampler.py
  • Fixed _stream_sample_loop to wait for ALL batch elements to complete (not just first)
  • Updated _stream_decode_state to properly handle batch dimensions
  • Net reduction of 3 lines of code

Testing

  • Tested with Gemma3_4B on NVIDIA L40s GPU
  • Single prompt streaming: Works (unchanged)
  • Batched non-streaming: Works (unchanged)
  • Batched streaming: Now works (new feature!)

Backward Compatibility

Fully backward compatible - no breaking changes to existing API.

Fixes #406

XBastille avatar Nov 01 '25 15:11 XBastille