Marc van Zee
Marc van Zee
+1 for InvertPermutation.
Thanks for the input @n2cholas! After chatting with @jheek offline, the consensus is that it is indeed useful to return regular dicts, but that we block implementing this on merging...
Sorry for the delay -- I was on parental leave. @jheek could you tell us whether any progress has been made on merging the chex and flax dataclasses?
@Dsantra92 sure, feel free to give it a try. We would have to run it against all our internal tests as well, but we can do that once all our...
@cgarciae who is working on a transfer learning example
#1980 improves some docstrings (also adds a variation of the example above in the `nn.scan` docstring). I added placeholders with `#1977` in that PR where docstrings should further be improved.
Thanks for the feedback @rongcuid, I have renamed this issue to clarify that we should improve the documentation of causal mask.
Hi @billmark , thanks for your issue! Could you tell me in a bit more detail what you were missing from the description of `kernel_init` and `bias_init` arguments?
Thanks for the feedback, I've updated the issue description. Indeed, we don't seem to have our initializers on RTD, while we do have the activation functions. We can simply add...
Discussed this offline with @jheek and @cgarciae. We agreed that the current behavior is not desirable since we are assuming that padding tokens for avg_pool are 0's and we include...