Yuekai Zhang
Yuekai Zhang
> 1. Throughput is not RTFx. Throughput computation is a little bit complex. 2. Difference between perf_analyzer and https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py: perf_analyzer --streaming using a single wav file, however, client.py could use...
> Understood, I am sharing the stats_summary here: This summary is for a run with num_workers as 100. Looking at this, it seems like initial inferences are taking more time....
For warmup https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#model-warmup For stats.json https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md For stats_summary.txt, I just convert it from stats.json. e.g. "batch_size 19, 18 times, infer 7875.14 ms, avg 437.51 ms, 23.03 ms input 47.86 ms,...
We have not started benchmark and profilling yet. How do you cofing your warmup setting? Also, later we will support tensorrt backend which should have less time to warmup comparing...
> @yuekaizhang Since the zipformer steaming model is sequential, I just warmed up with some dry runs. Also, wanted to check about the logs on github client repo where RTF...
@haiderasad We have no plan to integrate faster whisper. I recommand to try whisper TensorRT-LLM (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper), which is the current fastest implementation according to https://github.com/shashikg/WhisperS2T?tab=readme-ov-file#benchmark-and-technical-report.
See #551. @haiderasad
> 我也遇到了同样的问题,请问最后是如何解决的 https://github.com/yanqiangmiffy/InstructGLM/issues/1#issuecomment-1482778224
> Hi @csukuangfj @yuekaizhang > > Here are some notes based on my understanding: > > * These _cache are actually implicit states defined in Nvidia Triton, that are used...
> I started working on it, but I am a bit confused about 1 thing. > > In https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/export.py#L291 I see you already have onnx script for streaming zipformer right?...