Ross Wightman
Ross Wightman
@gau-nernst there is a test that loads all pretrained checkpoints when run outside of the github CI, but it cannot verify correctness / changes in output, only that the weights...
So I ran through some models, many of them work, resnet, vit, etc. But as soon as you get into models with non-persistent buffers, lots of issues. swin, maxvit crash,...
I was thinking about this recently, the faster load time is important, but also related are things like better init compatibility with advanced param sharding, etc. In many cases you...
@gau-nernst the FSDP1 way did, but FSDP2 doesn't rely on reset_parameters... I was thinking that might be a model to follow. If I do add FSDP2 support into timm it...
If you do find a clean approach to current buffers issue, can proceed as is of course. Ultimately I feel this will need to move in a direction where there...
@gau-nernst hmm, yeah it's not just models without the classifier, any use case where num classes doesn't match and a new different classifier gets created (old one removed from state_dict)...
FYI, I'm running some bulk evals to compare against existing numbers.
@gau-nernst hmm, so the idea with the num_classes ultimately coming from the model was that it is the ultimate source of truth wrt to what config/args will result in the...
Something else came up in the larger test, models using BlurPool are failing. Also, thinking about this a bit more, this solution is going to cause issues with some use...
@gau-nernst yeah, it's at the layer level I was getting more concerned... there is no way to realiably track down what the uses of various layers are in the wild,...