gyou2021

Results 5 issues of gyou2021

Hi, Could you please provide the LitePose-Auto-XS model for the COCO database? The speed of LitePose-Auto-XS CrowedPose model is much faster than that of LitPose-Auto-S CrowedPose model. I wonder how...

… with kv_cache, reuse_cache, flash attention, internal bucket, etc. # What does this PR do? Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention,...

Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention, internal bucket, etc. Test command: cd optimum-habana python -m pytest tests/test_text_generation_example.py -v -s cd examples/text-generation/...

run-test

Auto TP in auto_tp.py needs to handle linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after...

The MHA without cache compression replaces MLA for the prefill stage of DeepSeek-V2, significantly reducing computation costs, especially for long sequence inputs during inference. The time to the first token...

run-test