Cody Yu
Cody Yu
Sorry we're busying with the company event (Ray Summit) until this week. Will try to find some time after the event to look into it. @SolitaryThinker could you also take...
The CI is already triggered.
> @comaniac how can I trigger the CI? I have no dev env for vllm currently Does that mean you cannot verify this PR locally? We should avoid using CI...
> > > Can not work on NVIDIA Ampere GPU, for example 3090. > > > > > > Unfortunate limit of Triton > > Does [#5975](https://github.com/vllm-project/vllm/pull/5975) help for this?...
btw did you test on H100?
@robertgshaw2-neuralmagic we are also suffering from the illegal memory access even before this refactoring. It's weird because I didn't find this issue at v0.5.0 and it's still working for me...
> @robertgshaw2-neuralmagic @comaniac There is a potential risk of illegal memory access, I have made changes but have not yet submitted them. Please refer to:[add_device_gurad](https://github.com/jeejeelee/vllm/blob/fix-moe-kernel/csrc/moe_align_block_size_kernels.cu#L115) Interesting. Do you think the...
Thanks for the detail steps, which are helpful. In the e2e case I believe vllm would make sure all tensors are on the right device, so this shouldn't be an...
Some points per offline discussion with @ruisearch42 - This is expected and a normal termination process in Ray. The "error" log is more like for debugging purpose. - To hide...
Hmm I'm not sure we want to have benchmark/evals. For correctness checking in the CI, we should be able to just test 2-3 cases to keep the stability.