optimum-habana Quantization for FSDPA

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval Enforce recompute flag on fsdpa quantization Allow quantization using HQT Document FusedScaledDotProductAttention quantization

May 13 '24 10:05 dudilester

Added a commit for documenting the fsdpa quantization changes. This PR includes the below PR commits + the doc commit https://github.com/huggingface/optimum-habana/pull/967 @libinta - the PR should be labeled synapse_1.16_dependency

May 13 '24 10:05 dudilester

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 30 '24 12:05 HuggingFaceDocBuilderDev

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@regisss I see that sdpa is not tested in bf16 too. it can be added. can you or @libinta take care of it?

May 30 '24 14:05 MrGeva