gyou2021
gyou2021
Hi, Could you please provide the LitePose-Auto-XS model for the COCO database? The speed of LitePose-Auto-XS CrowedPose model is much faster than that of LitPose-Auto-S CrowedPose model. I wonder how...
… with kv_cache, reuse_cache, flash attention, internal bucket, etc. # What does this PR do? Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention,...
Added Qwen2-MoE model into optimum-habana, optimizing its performance on Gaudi with kv_cache, reuse_cache, flash attention, internal bucket, etc. Test command: cd optimum-habana python -m pytest tests/test_text_generation_example.py -v -s cd examples/text-generation/...
Auto TP in auto_tp.py needs to handle linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after...
The MHA without cache compression replaces MLA for the prefill stage of DeepSeek-V2, significantly reducing computation costs, especially for long sequence inputs during inference. The time to the first token...