Bryan

Results 3 issues of Bryan

Hi Alex, this is very impressive work! My use case is for an environment where - each env requires its own process - on a 16 core machine, maximum sample...

Hi, great work! I'm curious as to how much compute (in terms of # cpu cores, # and type of gpus) it takes to run the example on breakout. Thanks!

Quick question: given the attention mask `jnp.tril(jnp.ones(window_size, window_size * 2), window_size)` this means that in this implementation, for a given head & window, the `i`th query ends up attending not...