TM
TM
Dear developers, I am trying to reproduce the [bert benchmarking result](https://github.com/Tencent/TurboTransformers/blob/master/docs/bert.md) on my machine.  I just run `bash run_gpu_benchmark.sh` but the QPS is much slower than the declared value....
Hi all, I was using DALI with PyTorch recently and am impressed by its excellent performance. Currently all of my training data have to be placed on local SSD storage...
Dynamic shared memory of [GEMV](https://github.com/mit-han-lab/llm-awq/blob/main/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu#L103) kernel is not allocated when calling GEMV kernel which causes Illegal Memory Access error. This pull request fixes above issue by specifying shared memory size...