Ross Wightman comments

Results 497 comments of


                                            Ross Wightman

trafficstars

❓ [Question] Can't reproduce imagenet results of RN50 model trained on `pixparse/cc3m-wds`

@clownrat6 not sure what's going on there, should be closer to 20. I uploaded that cc3m instance and I've trained to near 20 with it so if it downloaded without...

AttributeError: 'TextTransformer' object has no attribute 'lock' when using --lock-text in SIGLIP2 fine-tuning

@praveen5733 it never ended up getting implemented for the default text model, only HF wrapper... see also #648 There was a PR on the go, but I never had the...

Wire up custom attention block via config

@JeniaJitsev sounds good, there's ideas from diff papers there and they wouldn't necessarily all make sense in combo ... qk_norm and scaled_cosine_attn are explicitly disabled together, does not make sense...

Wire up custom attention block via config

I've found layer scale to benefit quite a few vit training regimes... that's been supported for awhile but not sure if any from-scratch runs use it.... ls init values usually...

Wire up custom attention block via config

@JeniaJitsev also, do confirm the model architecture changed according to the config flags set :) I ran through several of them but might have missed one and you don't want...

Wire up custom attention block via config

The print of the model from the main script after creation gives a quick overview, you can ensure it's a CustomResidualAttentionBlock and that the norm layers you intended to enable...

Wire up custom attention block via config

> @rwightman So, it seems it makes sense to test 2 setups: > > 1. qk norm active (otherwise everything else standard training) > 2. scale head + scale_attn (as...