fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

[Feature] existing streaming latency is still takes time,

Open kunci115 opened this issue 1 year ago • 6 comments

streaming in 4090 tooks more than 2 second depend on length of token, is there a way to yield it/return while the engine still generating?

kunci115 avatar Jul 23 '24 07:07 kunci115

PR Welcome

Stardust-minus avatar Jul 23 '24 07:07 Stardust-minus

Please compile the model, or try the quantized version.

PoTaTo-Mika avatar Jul 23 '24 11:07 PoTaTo-Mika

@PoTaTo-Mika what do you mean by compile the model ? also how to do quantized version? since I only do steps for inference in english documentation https://speech.fish.audio/en/inference/#2-create-a-directory-structure-similar-to-the-following-within-the-ref_data-folder

kunci115 avatar Jul 24 '24 04:07 kunci115

there's a python file called quantize.py, you can view the file and choose to quantize. image

PoTaTo-Mika avatar Jul 24 '24 04:07 PoTaTo-Mika

there's a python file called quantize.py, you can view the file and choose to quantize. image

its creating me a folder quantized version of the model now, just run it like previous run with that checkpoints model? still got the same latency

kunci115 avatar Jul 24 '24 05:07 kunci115

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Sep 16 '24 00:09 github-actions[bot]