Yuekai Zhang
Yuekai Zhang
> I follow the latest turtorial to run [build_wenetspeech_zipformer_offline_trt.sh](https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh). It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with...
@Vergissmeinicht Sorry for the late reply, I am OOO past days. Would you mind trying https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37? Or you could set a smaller opt and max shape, with shorter seq_len and...
@MahmoudAshraf97 Hi, I suggest using the custom-defined executor as shown here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L149. I'm not sure if trtllm.Executor is compatible with the Whisper encoder. The trtllm.ModelType.ENCODER_ONLY may have hardcoded logic for...
@haiderasad See https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/whisper.md.
@lightbooster Hi, whisper has not supported this option yet. I would update here if it works or when we could remove the 30s restrictions.
> if this is supported now? Currently, for the distill-whispr or fine-tuned Whisper models, it is possible to configure audio other than 30 seconds. The --remove-input-padding option is also supported,...
> We are using whipser for streaming speech recognition. Will this padding increase the amount of calculation at the beginning of the audio stream, and will the reasoning affect the...
@lionsheep24 https://github.com/k2-fsa/sherpa/issues/597#issuecomment-2146719866, check this. You may need to align the prompt, beam_size, and other hyper-parameters to get the same outputs. There are several succuss integration of whisper trt-llm you may...
> Huggingface library compared to the method provided in this repository. Theoretically, the minor difference of feature values would not have a effect on the transcript results. We actually support...