optimum-habana
optimum-habana copied to clipboard
Quantization for FSDPA
Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval Enforce recompute flag on fsdpa quantization Allow quantization using HQT Document FusedScaledDotProductAttention quantization
Added a commit for documenting the fsdpa quantization changes. This PR includes the below PR commits + the doc commit https://github.com/huggingface/optimum-habana/pull/967 @libinta - the PR should be labeled synapse_1.16_dependency
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.