Kyle Daruwalla
Kyle Daruwalla
cuDNN has its own extension that triggers on CUDA + cuDNN. I don't see what would hang in the extension itself, so maybe the partial overlap of CUDA is causing...
We should probably change that line in the Adam code to use Adapt.jl to get the correct type instead of hard-typing the return of `get!`.
I don't think we need the fix in Optimisers.jl because the state is initialized separately (and correctly). This appears to only be a bug for IdDict optimizers. Agreed that we...
> taking 2 steps of adam with separate gradients instead of a single step with the accumulated one Yeah, with ADAM this will certainly be wrong. Referencing https://github.com/FluxML/Zygote.jl/issues/991#issuecomment-864411649, it's not...
Related to #2293 as well
FWIW, most of the current maintainers do not like `train!` or the current difficulty finding information in the docs. We discussed both topics in the most recent ML call, and...
A NEWS entry for this feature would be good too
I think it might be confusing for users. Someone used to `show` on the CPU might interpret the absence of NaN messages as no NaNs. Is it possible to make...
That subset of letters would also be awesome for programming languages with unicode support. They are the most often used letters for indexing so having super and subscripts would be...
Yeah, that seems to avoid scalar indexing