Sana
Sana copied to clipboard
About kernal fusion
You mentioned in the report that Triton was used for kernel fusion, but the corresponding functions for attn and ffn in the config were not called. Could you tell me why?
attn_type: linear
ffn_type: glumbconv
Since some of our users with Window system are not comfortable with triton and that's why we set the default setting to original linear attention implementation for everyone's better experience.
If I want to use triton, attn_type can be set to triton_linear, but what should ffn_type be set to? loading pretrained weights be affected?
Since some of our users with Window system are not comfortable with
tritonand that's why we set the default setting to original linear attention implementation for everyone's better experience.
now we have triton on Windows via this repository and it is working
https://github.com/woct0rdho/triton-windows/releases
If I want to use triton, attn_type can be set to triton_linear, but what should ffn_type be set to? loading pretrained weights be affected?
I have the same question. Could you let me know if you've resolved this issue, or do you have any recommended settings for the ffn_type configuration?