optimum-habana
optimum-habana copied to clipboard
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
# What does this PR do? Support long sequences 32k with bs4 with Flash attention and FusedSDPA ## Before submitting Out of memory for any 32k input token ## Use...
# What does this PR do? Initial mistral fp8 change # Command Lines: 1) 128x128xbs4 ` QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_generation.py --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 896 --fp8...
# What does this PR do? Mixtral-8x22b model loading needs to be on meta dueto host memory limitation. Fixes # (issue) ## Before submitting - [ ] This PR fixes...
Internal implementation for RoPE changed so that if one of parameters data type is FP32, the op will be performed in FP32 data type. To force the op to bf16,...
# What does this PR do? Fixes issues with decoder-only generation. Multiple issues were found and corrected. ``` root@idc705326-7:/optimum-habana# GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/transformers/tests/models/ -k test_generate_from_inputs_embeds_decoder_only ============================================================================================================== test session...
# What does this PR do? The ClipSeg model has been validated using graph mode on Gaudi 2. Adding this as an example for users. Performance number: Without graph mode...
# What does this PR do? Adds support for contrastive search for static and dynamic inputs, and low memory configs. Also adds support for `GPT2DoubleHeadsModel`. Fixes the tests below: ```...
# What does this PR do? This PR provides options to workaround failures seen in the image-classification example when invoking the HF Transformer autoclass pretrained functions, while running with certain...
add CI for mistral performance