danielhua23 comments

Results 11 comments of


                                            danielhua23

a_sh_rd_delta_o

same question, I can't figure out why a_sh_rd_delta_o has something to do with thread_n_blocks. @efrantar Could you pls help explain that when you are not busy? Many thanks!

[Bug]: TimeoutError During Benchmark Profiling with Torch Profiler on vLLM v0.6.0

`INFO: 127.0.0.1:45346 - "POST /start_profile HTTP/1.1" 500 Internal Server Error` I encounter a same issue in NV GPU

[Bug]: TimeoutError During Benchmark Profiling with Torch Profiler on vLLM v0.6.0

@robertgshaw2-neuralmagic Hi, added --disable-frontend-multiprocessing did not work for me. I think above infos are not full, below is my full info, the real reason is ` AttributeError: 'GPUExecutorAsync' object has...

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?

@alexsamardzic thanks for your good response, I want to confirm that is `mixed data-types GEMM on Ampere generation GPUs requires re-arranging of elements of tensor having smaller data-type. CUTLASS is...

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?

Thanks for your detailed information @manishucsd , which is very useful for me. Still left a question, Marlin seems implement the mixed gemm using preprocess weights AOT that is 1st...

[Bug]: vllm v0.6.0 profiler report GPUExecutorAsync object has no attribute '_run_workers' on ROCm and NV H20

thanks @SolitaryThinker , its work for me now. Can I view it as a workaround? if so, Could you pls notify me when you fix this problem?

【llm_perf issue】using byte_infer_perf/llm_perf/launch.py to test chatglm, but meet multi-process competing

@suisiyuan 你好，有空的时候可以帮忙看一看不？

【llm_perf issue】using byte_infer_perf/llm_perf/launch.py to test chatglm, but meet multi-process competing

> > @suisiyuan 你好，有空的时候可以帮忙看一看不？ > > 好的，我这边看看，应该是进程管理的问题。感谢你的时间

[QST] How to implement a fused mixed precision matrix multiplication such as w4a4 + w16a16?

[Bug]: Error in benchmark model with vllm backend for endpoint /v1/chat/completions

@ywang96 @DarkLight1337 Hello, if I have to install vllm using source code in a docker on nvidia GPU, which docker image would you recommend?