Chen Yuwen issues

Results 4 issues of


                                            Chen Yuwen

How to support different models with different tensor_para_size?

I have 4 GPUs and 3 models called small, medium and large. I want to deploy small model on GPU 0, medium model on GPU 1, and large model on...

[Bug]: Qwen2 Moe FP8 not supported on L40

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3...

bug

[Bugfix] Fix none seed sampling in rejection_sampler

This PR fixes https://github.com/vllm-project/vllm/issues/11123 When parts of seeds are None in a batch, list slice of a tensor return a new tensor rather than a view of the original tensor....

needs-rebase

[QST] Gemm got 'incomplete type is not allowed' when use Sm90

When I change https://github.com/efeslab/Nanoflow/blob/main/pipeline/include/cutlassGemmWrapperImpl.cuh#L89 SmArch to Sm90, got ``` /root/Nanoflow/3rdparty/cutlass/include/cutlass/gemm/device/gemm.h(264): error: incomplete type is not allowed using GemmKernel = typename kernel::DefaultGemm< ^ detected during: instantiation of class "cutlass::gemm::device::Gemm [with ElementA_=BaseGEMMWrapper::ElementI...

question

? - Needs Triage

inactive-30d