Ross Wightman
Ross Wightman
@etetteh unlike majority of convnets, changing the resolution of the vit or mlp mixer model is essentially a different model. The sequence lengths change and with it the position embeddings...
@etetteh yes, I will leave this open until I have a chance to add support to dataset and also will push some better handling into the transforms fn so that...
@MichaelMonashev thanks, I'll tackle it at some point but low priority. It uses different blocks from the others so need to define those, and working with their model definitions is...
@alexander-soare that still won't be deterministic no? `torch.use_deterministic_algorithms(True)` needs to be set as it has broader scope than cudnn flag. Also not sure if the benchmark mode needs to be...
@alexander-soare deterministic is always such a bother :) so in your trials just the cudnn flag seemed to result in reproducible runs where as without it was different? It's definitely...
@belerico unfortunately in pytorch 1.9 replacing that with `//` or floor_divide will results in a pretty ugly warning when it's used (as it does in the official repo), in future...
@rsomani95 yes, that's fairly normal, EMA will race ahead for the middle part of training, large gains in early part, then it's painfully slow and sometimes goes down for quite...
@rsomani95 reasonable chance you'll see the peak result in the 400-500 epoch timeframe, it can be difficult to judge 'when' though
@rsomani95 thanks for the update, sorry for the lag, I've been trying to focus on getting a few other things polished off so haven't had a chance to try this...
@rsomani95 I likely caused some merge conflicts w/ refactoring related to adding efficientnetv2 official impl. I can do the fixup/cleanup when it's ready to merge after your next runs.