Karan Shah
Karan Shah
@psfoley @mansishr @shreyasnc1 @igor-davidyuk @acharyasunil
@sky712345678 This looks like a serialization issue. Tensorflow 2.7 had issues supporting native serialization and requiring overrides on `__reduce__`. This was fixed in later Tensorflow releases https://github.com/keras-team/keras/pull/14748. Can you quickly...
Fisher Information is taken as expectation over the variance of score *per sample*. PyTorch currently doesn't support batch-computation of gradients *per sample*, hence the batch size of 1. You can...
We do not have this tutorial in `openfl-workspace`, closing this PR.
@MostafaDehghani Are you accepting community PR contributions for this? Edit: @rdilip I have created #1062 addressing all these. Understand this thread is stale, but you may give it a try...
Flash attention is now available in `jax-nightly` with a `cudnn` implementation: [jax.nn.dot_product_attention](https://github.com/google/jax/blob/main/jax/_src/nn/functions.py#L825-L922). It only supports Ampere architecture and later. Note that the default is `xla`.
> As discussed offline: lets land the simplest version first, without dropout or other complications. Then progressively add features. Thanks for adding FA! Is there a timeline to add `dropout`...
> Yes, this is on our radar to be implemented. Can we know what types of model you are working on that needs the dropout? Attention dropout would help for...
Most of my comments are nitpicks around naming/formatting and/or structure. Please disposition as you find relevant. Also, do we need TF v1 (`tf.Session` API) task runners? These are ancient APIs...