unify-parameter-efficient-tuning
unify-parameter-efficient-tuning copied to clipboard
Does this unified view take attention mask into consideration?
I am not familiar with the theoretic derivation, but I am interested in the range of suitability of the formula。Thank you。
Yes the derivation holds for masked attention