Harry Mellor comments

Results 298 comments of


                                            Harry Mellor

[Feature]: Application support for the InternVL2.5-78B series models.

I believe this is an old warning about AWQ in general. The fastest (and definitely optimised!) AWQ we have is `--quantization awq_marlin`.

[Feature]: Application support for the InternVL2.5-78B series models.

I believe this should be automatically set though.

[Core] Fix sharing of stateful logits processors

Hi @maxdebayser do you plan to continue this work?

[Core] Fix sharing of stateful logits processors

Closing as stale. If you plan to continue this work, feel free to re-open.

[Hardware][Intel] fp8 kv cache support for CPU

@jikunshang do you plan to continue this work?

[Hardware][Intel] fp8 kv cache support for CPU

Great! In that case, could you remove the TODO from the docs regarding this feature?

[BugFix] Spec Decode error:No available block found in 60 Second.

Closing as stale

[Bug]: deepseek-r1 on A800

It is expected that the first and last rank will have higher memory usage because: - The first rank contains the input embeddings - The last rank contains the output...

[Bug]: deepseek-r1 on A800

The other half of this issue is that DeepSeek R1 has 61 hidden layers. Currently, if the number of hidden layers is not divisible by the pipeline world size, the...

[Bug]: deepseek-r1 on A800

I'll try and make a PR that handles this automatically, but in the meantime could you try setting `VLLM_PP_LAYER_PARTITION=7,8,8,8,8,8,7,7`?