fish-speech [Feature] existing streaming latency is still takes time,

streaming in 4090 tooks more than 2 second depend on length of token, is there a way to yield it/return while the engine still generating?

Jul 23 '24 07:07 kunci115

PR Welcome

Jul 23 '24 07:07 Stardust-minus

Please compile the model, or try the quantized version.

Jul 23 '24 11:07 PoTaTo-Mika

@PoTaTo-Mika what do you mean by compile the model ? also how to do quantized version? since I only do steps for inference in english documentation https://speech.fish.audio/en/inference/#2-create-a-directory-structure-similar-to-the-following-within-the-ref_data-folder

Jul 24 '24 04:07 kunci115

there's a python file called quantize.py, you can view the file and choose to quantize.

Jul 24 '24 04:07 PoTaTo-Mika

there's a python file called quantize.py, you can view the file and choose to quantize.

its creating me a folder quantized version of the model now, just run it like previous run with that checkpoints model? still got the same latency

Jul 24 '24 05:07 kunci115

This issue is stale because it has been open for 30 days with no activity.

Sep 16 '24 00:09 github-actions[bot]