Kero Liang comments

Repositories
Issues
Comments

Results 13 comments of


                                            Kero Liang

[Perf] Optimize MRotaryEmbedding::get_input_positions performance by numba

This PR continues the idea of #17617. Thanks @vadiklyutiy Could you please take a look? @ywang96

[V1] Feedback Thread

Does V1 support FP8 (W8A8) quantization? I tried [nm-testing/Qwen2-VL-7B-Instruct-FP8-dynamic](https://huggingface.co/nm-testing/Qwen2-VL-7B-Instruct-FP8-dynamic) on v0.7.1 V1 arch, no error thrown but got gibberish result. Same code and model works properly on v0.7.1 V0 arch....

[Core] Faster logit_bias_logits_processor

If `len(logit_bias)` is large, maybe we can keep the copy of `logit_bias["index"]` and `logit_bias["value"]` in the device memory ahead of time (or in the first sample step), and re-use it...