Ross Wightman
Ross Wightman
@rose-jinyang what @TorbenSDJohansen suggested will work in a pinch, the model is already pretty much timm style and should work well, but it always takes a bit of time to...
@rose-jinyang took a while, but added with some improvements, on main branch now and in a pypi pre-release soon (were pushed to hf-hub as well https://huggingface.co/timm)
@jeromerony thanks for the PR, noticed you commenting on the twitter thread re the foreach, any desire to throw that in as an option too? probably should be a bool...
Thanks for the additions, I've been trying to push a number of things over the finish lately recently so haven't had a change to dig in to this, but will...
@CharlesLeeeee that trunc normal at std=.02 isn't actually THAT different from normal, at least I doubt it's different enough to have a significant impact on training, the other timm init...
@CharlesLeeeee k, but say for hparams, the timm create_model passes some regularization params through to timm models that wouldn't work for transformers, so if you args have drop_path (stochastic depth),...
@CharlesLeeeee both are NaN? vit from scratch usually requires grad clipping + adamw. FYI if you can't resolve the differences for train you could always train with timm and remap...
@iamhankai FYI, you can use `register_notrace_function` and `@register_notrace_module` to register leaf functions or modules in your model that won't trace in FX due to boolean and other flow control concerns...
Hmm, seems the tracing issue harder to solve, just preventing trace won't bypass the bool issue without some restructure. I'd also need to tweak some other interface issues wrt to...
@yxchng I actually do have some work in progress right now in `timm` to address https://github.com/huggingface/pytorch-image-models/issues/2190 (adding a `set_input_size()`) fn to vit, that would allow this to work, I don't...