Kyle Daruwalla
Kyle Daruwalla
On ImageNet? I used the MobileNets in this package a while back and trained them on CIFAR-100. Looking at that code, I used Adam and logit CE loss. Also a...
Yeah if something like CIFAR-10 doesn't work, then at least someone here can try and reproduce.
Okay I'll make my own script over the weekend and sanity check.
Sorry for the late reply. I did start writing a script, but I never gotten around to starting my testing. I kept meaning to reply as soon as I did...
Based on the current runs that finished, it looks like all the EfficientNetv2 models have some bug. Only the `:small` variant trains to completion. The rest all drop to NaN...
I will say though that the ResNet loss curves don't look as bad as I remember them. Perhaps in this case, a different learning rate would fix things.
I'll modify the script to log gradient norms by layer and also do some local debugging just to sanity check the output isn't obviously wrong. I'll also add MobileNet to...
Closing as stale
Is there a specific graph we should look at? Just looking for apples-to-apples, I think `Conv((3, 3), 1 => 1)` might be the closest, and I'm not seeing a difference...
@theabhirath if you have bandwidth to rebase, that would be great. Just merging this PR will be a good starting point for me or someone else to finish up getting...