Chen Yuwen

Results 4 issues of Chen Yuwen

I have 4 GPUs and 3 models called small, medium and large. I want to deploy small model on GPU 0, medium model on GPU 1, and large model on...

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3...

bug

This PR fixes https://github.com/vllm-project/vllm/issues/11123 When parts of seeds are None in a batch, list slice of a tensor return a new tensor rather than a view of the original tensor....

needs-rebase

When I change https://github.com/efeslab/Nanoflow/blob/main/pipeline/include/cutlassGemmWrapperImpl.cuh#L89 SmArch to Sm90, got ``` /root/Nanoflow/3rdparty/cutlass/include/cutlass/gemm/device/gemm.h(264): error: incomplete type is not allowed using GemmKernel = typename kernel::DefaultGemm< ^ detected during: instantiation of class "cutlass::gemm::device::Gemm [with ElementA_=BaseGEMMWrapper::ElementI...

question
? - Needs Triage
inactive-30d