Simon Mo comments

Results 313 comments of


                                            Simon Mo

[CI/Build] Increase VLLM_MAX_SIZE_MB to 300M

@khluu maybe we can set RUN_WHEEL_CHECK to false by default and turn it on in CI only.

[Bug]: Error while running inference with LLava 1.6 in v0.5.1

^ @DarkLight1337 this might be related to the refactoring?

[Core] Add constants for CUDA compute capabilities

/gemini review

[RFC]: Hardware pluggable

For more ephemeral conversations, please join the vLLM slack and join #sig-extensible-hardware channel to discussion!

[Usage]: Can VLLM be used with a fine-tuned unsloth model

You should save the model into disk with Huggingface format, and vLLM can load it from disk

[Roadmap] vLLM Roadmap Q2 2025

Q3 Roadmap has been published #20336

[Model] DeepSeek-V3 Enhancements

@july8023 It should work on 4090, generally the models takes about 600GB memory, then you want about 100-300GB for KV cache so feel free to plan around that. @fsaudm A100s...

[Model] DeepSeek-V3 Enhancements

The model currently does not support --dtype bfloat16 because it is natively trained in fp8. Can you point me to the bf16 version?

[Model] DeepSeek-V3 Enhancements

vLLM does support this bf16 model on A100. It looks like the config.json properly removed `quantization_config` so it would already.

[Hardware][Intel CPU][DOC] Update docs for CPU backend

Hmmm please can you edit the documentation with gpu example? That's the primary reader's use case