flash-attention
flash-attention copied to clipboard
Is there a way to use flash attention and selectively finetune only q projection layer, leaving k and v projection layers frozen?
You can just change the python interface (https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_interface.py) to set k_grad and v_grad to None and see if that works.