danielhua23 issues

Results 7 issues of


                                            danielhua23

[Bug]: vllm v0.6.0 profiler report GPUExecutorAsync object has no attribute '_run_workers' on ROCm and NV H20

### Your current environment The output of `python collect_env.py` on ROCm Collecting environment information... WARNING 09-11 03:28:33 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn`...

bug

[Feature] Support torch profiler

### Checklist - [X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. - [X]...

good first issue

【llm_perf issue】using byte_infer_perf/llm_perf/launch.py to test chatglm, but meet multi-process competing

### 错误描述机器: h100-80g-hbm3 基于以下的chatglm-6b-xxx.json配置测试，在tp1, bs24, inputlen1024下报OOM ![image](https://github.com/user-attachments/assets/cd7eb052-342f-4eb1-ae92-70b55f0a89a6) ![image](https://github.com/user-attachments/assets/eec794f5-060a-4e13-8e91-926d294a9bc8) 修改json配置为以下，从tp1, bs24, inputlen1024开始跑，tp1, bs24, inputlen1024可以正常运行 ![image](https://github.com/user-attachments/assets/5fa0db03-4ea8-4078-a869-ddba76a79c4b) ![image](https://github.com/user-attachments/assets/17893db5-fca3-4ffb-bbd5-6a968688160d) 从代码https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/launch.py#L260猜测是代码未能按照预期所示等待各配置子进程结束再launch下一个子进程，从而导致子进程发生GPU争抢，导致本能跑的配置在争抢环境下发生OOM ### 复现步骤 step1 launch container ```docker run --net=host --pid=host --ipc=host --shm-size 64g --privileged...

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?

**What is your question?** Dear cutlass team, lets consider sm80 and f16s8, the example of f16s8 TN mixed gemm shown[ here](https://github.com/NVIDIA/cutlass/pull/1084/files#diff-48de2b167ad3cf3321f972270331653a199001283f2c59fc8f5a70f2d14f7082R66) is different from TRT-LLM [implementation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L112), specifically, to my knowledge,...

question

? - Needs Triage

inactive-30d

[QST][CUTE] How to systematically and easily compose the cute layout algebra operations to get custom layout

**What is your question?** Hi, I am very interested in how to write any custom layout I want using cute like https://github.com/NVIDIA/cutlass/blob/main/tools/util/include/cutlass/util/mixed_dtype_utils.hpp#L362 show, but its very difficult for me to...

question

? - Needs Triage

inactive-30d

Do you need we community do sth for nano vllm

hey bros, nice jobs! I am wondering if you need we individuals join you to develop some new feats into nano vllm? if that, could you pls list some urgent...

lightllm_kernels有计划什么时候开源吗

如题，多谢