Jack Gallagher

Results 38 comments of Jack Gallagher

hmm really? the scales are just a pointwise op between the dot product and logits in a normal implementation. why does flash attention make that harder?

i went ahead and implemented it anyway - probably makes sense to put async support behind a feature flag and open a pr? https://github.com/GallagherCommaJack/sonic-channel/commit/5f942565b814c49e67a0b35d45485698f2f2c377

Where should it get TLS cert from?

Yeah this is going to be more important for iOS/Android than desktop because the expectations are different. I'm probably fine with QT's default solution for Linux. For Android we'll probably...

tagged as good first issue because I think it's not too hard but will end up giving a pretty thorough tour of the codebase

a reference re jax rng mechanics https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#jax-prng

> Is this a gotcha specifically in the seed=None case, or the more general seed=int case? I'm guessing the latter. both, the int needs to come from somewhere > If...

> Anyone writing custom training loops is going to need to know about this API. If they forget it -- well, back to the current status quo, which is that...