tft-torch
tft-torch copied to clipboard
Usage of flash attention
Consider wrapping the call to self.attention in InterpretableMultiHeadAttention with
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True):
In order to improve speed and memory efficiency.