danielhua23

Results 11 comments of danielhua23

same question, I can't figure out why a_sh_rd_delta_o has something to do with thread_n_blocks. @efrantar Could you pls help explain that when you are not busy? Many thanks!

`INFO: 127.0.0.1:45346 - "POST /start_profile HTTP/1.1" 500 Internal Server Error` I encounter a same issue in NV GPU

@robertgshaw2-neuralmagic Hi, added --disable-frontend-multiprocessing did not work for me. I think above infos are not full, below is my full info, the real reason is ` AttributeError: 'GPUExecutorAsync' object has...

@alexsamardzic thanks for your good response, I want to confirm that is `mixed data-types GEMM on Ampere generation GPUs requires re-arranging of elements of tensor having smaller data-type. CUTLASS is...

Thanks for your detailed information @manishucsd , which is very useful for me. Still left a question, Marlin seems implement the mixed gemm using preprocess weights AOT that is 1st...

thanks @SolitaryThinker , its work for me now. Can I view it as a workaround? if so, Could you pls notify me when you fix this problem?

> > @suisiyuan 你好,有空的时候可以帮忙看一看不? > > 好的,我这边看看,应该是进程管理的问题。 感谢你的时间

@ywang96 @DarkLight1337 Hello, if I have to install vllm using source code in a docker on nvidia GPU, which docker image would you recommend?