Jack Gallagher

Results 32 comments of Jack Gallagher

hmm really? the scales are just a pointwise op between the dot product and logits in a normal implementation. why does flash attention make that harder?

i went ahead and implemented it anyway - probably makes sense to put async support behind a feature flag and open a pr? https://github.com/GallagherCommaJack/sonic-channel/commit/5f942565b814c49e67a0b35d45485698f2f2c377