Yuekai Zhang comments

Results 129 comments of


                                            Yuekai Zhang

add `chunk_length` parameter to Whisper

@MahmoudAshraf97 Hi, I just tried the more than 1 dynamic shape conv1d solution by setting codes below: x = Tensor(name="x", dtype=self._dtype, shape=[-1, self.config.n_mels, -1], dim_range=OrderedDict([ ("batch_size", [bs_range]), ("feature_dim", [self.config.n_mels]), ("feature_len_range",...

add `chunk_length` parameter to Whisper

> as I mentioned in my trials in the PR, this was a step to make it work but I couldn't complete it because of the slice operator or other...

Training speed is not improved by using a better GPU

Hi @SongLi89, thank you for raising this issue. I will help check if there are any performance bottlenecks. Will reply here with any updates.

Training speed is not improved by using a better GPU

> However, we found that training speeds were not significantly improved with more expensive one (H100). @SongLi89 Could you tell me the specific comparison results of the training speed in...

Training speed is not improved by using a better GPU

> (Of course, the H100 has more memory than the A100, and we can use a larger number for “maximum duration”. This test is just to compare the performance of...

Training speed is not improved by using a better GPU

> Which torch/CUDA version you used for test? so for the training settings above (wenetspeech L), one step around 0.5s for both. H100 is slightly faster, but 0.36 is never...

Training speed is not improved by using a better GPU

> Hi yuekai, I tried with your environment and we have got similar acceleration ratio. Thanks a lot. But still it is great that the performance can further be improved....

High WER and Incomplete Transcription Issue with Whisper

@teith https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L296 For single file test, you may replace the line with your ground truth first. Or you could test using a huggingface dataset like the way in run.py.

High WER and Incomplete Transcription Issue with Whisper

@tianchengcheng-cn Would you mind trying to increase the max_new_token here? https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L261 I think the transcription's token is more than 96. You may try a shorter wav file.

High WER and Incomplete Transcription Issue with Whisper

> Hi, I'm facing a similar issue with degradation of WER whilst running _batched_ transcriptions of ~20-30 seconds of audios from the Common Voice 16_1 Dataset. WER seems to be...