DefTruth comments

Results 256 comments of


                                            DefTruth

[Bug]: Why 0 device need more memory? will it cause OOM?

I use vllm==0.7.4.dev145+g73e0225ee

[Bug]: Why 0 device need more memory? will it cause OOM?

Enabling chunked prefill and CUDA graph may lead to unbalanced VRAM usage.

[Bug]: Why 0 device need more memory? will it cause OOM?

> Please submit a minimal reproducible code or command. run DeepSeek-R1-Distill-Qwen-32B on L20x4: ```bash nohup python3 -m vllm.entrypoints.openai.api_server \ --model /workspace/hf_models/DeepSeek-R1-Distill-Qwen-32B \ --tensor-parallel-size 4 \ --max-model-len 32768 \ --max-num-batched-tokens 2048...

[Bug]: Why 0 device need more memory? will it cause OOM?

```bash INFO 03-03 20:36:02 [loader.py:422] Loading weights took 8.80 seconds (VllmWorkerProcess pid=1005428) INFO 03-03 20:36:02 [loader.py:422] Loading weights took 8.80 seconds (VllmWorkerProcess pid=1005426) INFO 03-03 20:36:02 [loader.py:422] Loading weights took...

[Bug]: Why 0 device need more memory? will it cause OOM?

VRAM is balance at the very begining.

[Kernel] Remove redundant Exp calculations

@LucasWilkinson PTAL, thanks~

[Kernel] Remove redundant Exp calculations

@LucasWilkinson some tests failed, but it seems not related to this PR.

[Kernel] Remove redundant Exp calculations

weird, i don't why the mamba kernel tests failed ```bash FAILED kernels/test_mamba_ssm_ssd.py::test_mamba_chunk_scan_cont_batch[seq_len_chunk_size_cases0-5-8-itype0] AssertionError: chunk_indices and chunk_offsets should have been set ```

[Kernel] Remove redundant Exp calculations

The mamba ssd kernel test failed, related PR https://github.com/vllm-project/vllm/pull/16623

Outlines doesn't install on Collab because Rust isn't available to compile outlines core

same error