gyou2021 issues

Results 5 issues of


                                            gyou2021

LitePose-Auto-XS COCO model

Hi, Could you please provide the LitePose-Auto-XS model for the COCO database? The speed of LitePose-Auto-XS CrowedPose model is much faster than that of LitPose-Auto-S CrowedPose model. I wonder how...

Added Qwen2-MoE model, optimizing its performance on Gaudi

… with kv_cache, reuse_cache, flash attention, internal bucket, etc. # What does this PR do? Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention,...

Added Qwen2-MoE model, optimizing its performance on Gaudi

Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention, internal bucket, etc. Test command: cd optimum-habana python -m pytest tests/test_text_generation_example.py -v -s cd examples/text-generation/...

run-test

Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models

Auto TP in auto_tp.py needs to handle linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after...

Optimized DeepSeek-V2 attention prefill with MHA.

The MHA without cache compression replaces MLA for the prefill stage of DeepSeek-V2, significantly reducing computation costs, especially for long sequence inputs during inference. The time to the first token...

run-test