gemma
gemma copied to clipboard
feat: Add streaming support for batched prompts
Description
Adds streaming support for batched prompts, resolving issue #406.
Changes
- Removed restriction that blocked batched streaming in
_sampler.py - Fixed
_stream_sample_loopto wait for ALL batch elements to complete (not just first) - Updated
_stream_decode_stateto properly handle batch dimensions - Net reduction of 3 lines of code
Testing
- Tested with Gemma3_4B on NVIDIA L40s GPU
- Single prompt streaming: Works (unchanged)
- Batched non-streaming: Works (unchanged)
- Batched streaming: Now works (new feature!)
Backward Compatibility
Fully backward compatible - no breaking changes to existing API.
Fixes #406