Kyle Daruwalla
Kyle Daruwalla
I think the main benefit of external packages over code duplication is maintainability and testing. If it is a relatively lightweight dep and we use it for only a few...
I'm still in favor overall
> I included it specifically for feature parity with Pytorch. I agree that it is cumbersome compared to the vector of vectors input, but I think it has utility in...
It seems that a lot of the complexity here comes from taking the storage cuDNN wants (packed array block) and de-sugaring it into what we like (separate blocks). Would it...
I think some paper references or (even better) code examples from other frameworks is necessary here so that we can help answer your request.
So what purpose does `bors` serve without the merging part? None, right? I'm in favor of deleting.
Thanks for this contribution! If I understand this correctly, this specifically uses ProgressMeter.jl as the front end for progress logging? Our current approach uses ProgressLogging.jl with integrates with VS Code...
Mutating variants sound like a good idea. For applying them to initialized models, we could make some wrapper interface like `init!(method, model)`, but that would require defining a `init!(..., ::Dense)`,...
Yeah my concern was relying on a particular field. We could always make `reinit!(method, model)` default to `fmap` on all trainables, then allow custom overrides. Similar approach to #1875.
> Most layers call bias bias, so filtering based on that might not be terrible. Maybe the reweight! function takes field names to ignore (default (:b, :bias)) and to act...