Jack Gallagher
Jack Gallagher
hmm really? the scales are just a pointwise op between the dot product and logits in a normal implementation. why does flash attention make that harder?
i went ahead and implemented it anyway - probably makes sense to put async support behind a feature flag and open a pr? https://github.com/GallagherCommaJack/sonic-channel/commit/5f942565b814c49e67a0b35d45485698f2f2c377
Where should it get TLS cert from?
Yeah this is going to be more important for iOS/Android than desktop because the expectations are different. I'm probably fine with QT's default solution for Linux. For Android we'll probably...
tagged as good first issue because I think it's not too hard but will end up giving a pretty thorough tour of the codebase
a reference re jax rng mechanics https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#jax-prng
> Is this a gotcha specifically in the seed=None case, or the more general seed=int case? I'm guessing the latter. both, the int needs to come from somewhere > If...
> Anyone writing custom training loops is going to need to know about this API. If they forget it -- well, back to the current status quo, which is that...