tft-torch Usage of flash attention

Usage of flash attention

Open shaharbar1 opened this issue 1 year ago • 1 comments

Consider wrapping the call to self.attention in InterpretableMultiHeadAttention with with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True): In order to improve speed and memory efficiency.

Jan 14 '24 13:01 shaharbar1

tft-torch tft-torch copied to clipboard

Usage of flash attention

tft-torch
tft-torch copied to clipboard