Libin Tang
Libin Tang
# What does this PR do? Mixtral-8x22b model loading needs to be on meta dueto host memory limitation. Fixes # (issue) ## Before submitting - [ ] This PR fixes...
# What does this PR do? Initial enablement with FP8 Training with Intel Gaudi Transformer Engine ([(porting from OHF #91) Only linear layer is replaced with FP8. Fixes # (issue)...
# What does this PR do? 1. add use_flash_attentiong, flash_attention_recompute, flash_attention_causal_mask 2. add mark step per decoder 3. add fusedsdpa fp8 Fixes # (issue) ## Before submitting - [ ]...
# What does this PR do? Initial emblement for mamba with static shape. Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the...