Ross Wightman comments

Results 530 comments of


                                            Ross Wightman

[RFC] Faster load time for large models

@gau-nernst there is a test that loads all pretrained checkpoints when run outside of the github CI, but it cannot verify correctness / changes in output, only that the weights...

[RFC] Faster load time for large models

So I ran through some models, many of them work, resnet, vit, etc. But as soon as you get into models with non-persistent buffers, lots of issues. swin, maxvit crash,...

[RFC] Faster load time for large models

I was thinking about this recently, the faster load time is important, but also related are things like better init compatibility with advanced param sharding, etc. In many cases you...

[RFC] Faster load time for large models

@gau-nernst the FSDP1 way did, but FSDP2 doesn't rely on reset_parameters... I was thinking that might be a model to follow. If I do add FSDP2 support into timm it...

[RFC] Faster load time for large models

If you do find a clean approach to current buffers issue, can proceed as is of course. Ultimately I feel this will need to move in a direction where there...

[RFC] Faster load time for large models

@gau-nernst hmm, yeah it's not just models without the classifier, any use case where num classes doesn't match and a new different classifier gets created (old one removed from state_dict)...

[RFC] Faster load time for large models

FYI, I'm running some bulk evals to compare against existing numbers.

[RFC] Faster load time for large models

@gau-nernst hmm, so the idea with the num_classes ultimately coming from the model was that it is the ultimate source of truth wrt to what config/args will result in the...

[RFC] Faster load time for large models

Something else came up in the larger test, models using BlurPool are failing. Also, thinking about this a bit more, this solution is going to cause issues with some use...

[RFC] Faster load time for large models

@gau-nernst yeah, it's at the layer level I was getting more concerned... there is no way to realiably track down what the uses of various layers are in the wild,...