Ross Wightman

Results 497 comments of Ross Wightman
trafficstars

@clownrat6 not sure what's going on there, should be closer to 20. I uploaded that cc3m instance and I've trained to near 20 with it so if it downloaded without...

@praveen5733 it never ended up getting implemented for the default text model, only HF wrapper... see also #648 There was a PR on the go, but I never had the...

@JeniaJitsev sounds good, there's ideas from diff papers there and they wouldn't necessarily all make sense in combo ... qk_norm and scaled_cosine_attn are explicitly disabled together, does not make sense...

I've found layer scale to benefit quite a few vit training regimes... that's been supported for awhile but not sure if any from-scratch runs use it.... ls init values usually...

@JeniaJitsev also, do confirm the model architecture changed according to the config flags set :) I ran through several of them but might have missed one and you don't want...

The print of the model from the main script after creation gives a quick overview, you can ensure it's a CustomResidualAttentionBlock and that the norm layers you intended to enable...

> @rwightman So, it seems it makes sense to test 2 setups: > > 1. qk norm active (otherwise everything else standard training) > 2. scale head + scale_attn (as...