danielhua23
danielhua23
### Your current environment The output of `python collect_env.py` on ROCm Collecting environment information... WARNING 09-11 03:28:33 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn`...
### Checklist - [X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. - [X]...
### 错误描述 机器: h100-80g-hbm3 基于以下的chatglm-6b-xxx.json配置测试,在tp1, bs24, inputlen1024下报OOM   修改json配置为以下,从tp1, bs24, inputlen1024开始跑,tp1, bs24, inputlen1024可以正常运行   从代码https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/launch.py#L260猜测是代码未能按照预期所示等待各配置子进程结束再launch下一个子进程,从而导致子进程发生GPU争抢,导致本能跑的配置在争抢环境下发生OOM ### 复现步骤 step1 launch container ```docker run --net=host --pid=host --ipc=host --shm-size 64g --privileged...
**What is your question?** Dear cutlass team, lets consider sm80 and f16s8, the example of f16s8 TN mixed gemm shown[ here](https://github.com/NVIDIA/cutlass/pull/1084/files#diff-48de2b167ad3cf3321f972270331653a199001283f2c59fc8f5a70f2d14f7082R66) is different from TRT-LLM [implementation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h#L112), specifically, to my knowledge,...
**What is your question?** Hi, I am very interested in how to write any custom layout I want using cute like https://github.com/NVIDIA/cutlass/blob/main/tools/util/include/cutlass/util/mixed_dtype_utils.hpp#L362 show, but its very difficult for me to...
hey bros, nice jobs! I am wondering if you need we individuals join you to develop some new feats into nano vllm? if that, could you pls list some urgent...
如题,多谢