sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

int8 quantized TTS model slower than fp32

Open martinshkreli opened this issue 1 year ago • 10 comments

(myenv) ubuntu@152:~/sherpa-onnx/python_api_examples$ python3 test.py Elapsed: 0.080 Saved sentence_0.wav. Elapsed: 0.085 Saved sentence_1.wav. Elapsed: 0.080 Saved sentence_2.wav. Elapsed: 0.074 Saved sentence_3.wav. Elapsed: 0.054 Saved sentence_4.wav. Elapsed: 0.081 Saved sentence_5.wav. Elapsed: 0.067

(myenv) ubuntu@152-69-195-75:~/sherpa-onnx/python_api_examples$ python3 test.py Elapsed: 19.561 Saved sentence_0.wav. Elapsed: 26.432 Saved sentence_1.wav. Elapsed: 27.989 Saved sentence_2.wav. Elapsed: 23.956 Saved sentence_3.wav. Elapsed: 11.361 Saved sentence_4.wav. Elapsed: 27.825 Saved sentence_5.wav. Elapsed: 19.567

any special flag to set to use int8?

martinshkreli avatar Feb 07 '24 00:02 martinshkreli

Fangjun will get back to you about it, but: hi, martin shkreli! We might need more hardware info and details about what differed between those two runs.

danpovey avatar Feb 07 '24 02:02 danpovey

@martinshkreli

Could you describe how you get the int8 models?

csukuangfj avatar Feb 07 '24 02:02 csukuangfj

Hi guys, thanks again for the wonderful repo. I followed this link to download the model: https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/vits.html#download-the-model

Then, I used that file (vits-ljs.int8.onnx) for inference in the python script (offline-tts.py). This was on an 8xA100 instance.

martinshkreli avatar Feb 12 '24 14:02 martinshkreli

@martinshkreli

Could you describe how you get the int8 models?

hi Fangjun, i just wanted to try and get your attention one more time, sorry if I am being annoying!

martinshkreli avatar Feb 16 '24 01:02 martinshkreli

The int8 model is obtained via the following code https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e074388e2d7/scripts/vits/export-onnx-ljs.py#L204-L208

Note that it uses https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e074388e2d7/scripts/vits/export-onnx-ljs.py#L207

It is a known issue about onnxruntime that quint8 is slower.

For instance, if you search with google, you can find similar issues:

  • https://github.com/microsoft/onnxruntime/issues/12854
  • https://github.com/microsoft/onnxruntime/issues/6732

csukuangfj avatar Feb 16 '24 12:02 csukuangfj

fangjun, is the int8 intended for different applications or devices then?

On Friday, February 16, 2024, Fangjun Kuang @.***> wrote:

The int8 model is obtained via the following code https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L204-L208

Note that it uses https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L207

It is a known issue about onnxruntime that quint8 is slower.

For instance, if you search with google, you can find similar issues:

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/sherpa-onnx/issues/575#issuecomment-1948317748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO24SJC2ZHERFOMYLKDYT5HQDAVCNFSM6AAAAABC45NFDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYGMYTONZUHA . You are receiving this because you commented.Message ID: @.***>

danpovey avatar Feb 17 '24 05:02 danpovey

int8 model mentioned in this issue is about 4x less in file size than that of float32.

If memory matters, then int8 model is preferred.

csukuangfj avatar Feb 17 '24 05:02 csukuangfj

hi @csukuangfj do you know how to optimize speed of an int8 model? I was experimenting several months ago with it, but i was not able to convert to qint8 and quint8 is really slow on cpu.

beqabeqa473 avatar Apr 03 '24 07:04 beqabeqa473

You don't need to optimize speed, you need to pick MB-iSTFT VITS model, they are order of magnitude faster than raw VITS with the same quality.

nshmyrev avatar Apr 09 '24 19:04 nshmyrev

You don't need to optimize speed, you need to pick MB-iSTFT VITS model, they are order of magnitude faster than raw VITS with the same quality.

where can we find these models?

smallbraineng avatar Jul 08 '24 19:07 smallbraineng