Results 129 comments of Yuekai Zhang

@MahmoudAshraf97 Hi, I just tried the more than 1 dynamic shape conv1d solution by setting codes below: x = Tensor(name="x", dtype=self._dtype, shape=[-1, self.config.n_mels, -1], dim_range=OrderedDict([ ("batch_size", [bs_range]), ("feature_dim", [self.config.n_mels]), ("feature_len_range",...

> as I mentioned in my trials in the PR, this was a step to make it work but I couldn't complete it because of the slice operator or other...

Hi @SongLi89, thank you for raising this issue. I will help check if there are any performance bottlenecks. Will reply here with any updates.

> However, we found that training speeds were not significantly improved with more expensive one (H100). @SongLi89 Could you tell me the specific comparison results of the training speed in...

> (Of course, the H100 has more memory than the A100, and we can use a larger number for “maximum duration”. This test is just to compare the performance of...

> Which torch/CUDA version you used for test? so for the training settings above (wenetspeech L), one step around 0.5s for both. H100 is slightly faster, but 0.36 is never...

> Hi yuekai, I tried with your environment and we have got similar acceleration ratio. Thanks a lot. But still it is great that the performance can further be improved....

@teith https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L296 For single file test, you may replace the line with your ground truth first. Or you could test using a huggingface dataset like the way in run.py.

@tianchengcheng-cn Would you mind trying to increase the max_new_token here? https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L261 I think the transcription's token is more than 96. You may try a shorter wav file.

> Hi, I'm facing a similar issue with degradation of WER whilst running _batched_ transcriptions of ~20-30 seconds of audios from the Common Voice 16_1 Dataset. WER seems to be...