Ross Wightman
Ross Wightman
@rsomani95 I was working w/ the grad caching a bit, but it is a bit messy and I was never satifisfied it was working 100% as it should. It also...
@mattdeitke I'm not following this exactly, the L/14 LAION-2B (`laion2b_s32b_b82k`) is the only model that should be returning transforms with 0.5, 0.5, 0.5, are you seeing that for other models?
@mattdeitke I can assure you that other models here were not trained with 0.5, that option was only added at the time of my last training of L/14 (it wasn't...
@lopho in released timm that model only exists as a in22k fine-tuned variant, but that's changing in a coming PR, I left in here as I was testing locally and...
I have not spent time on this, it looks like https://huggingface.co/docs/transformers/model_doc/vision-text-dual-encoder might be the approach. Would need to remap the checkpoints from OpenCLIP (I have something hacked together here for...
@mehdidc The exists check and error is there to prevent a mistake where one stomps over a previous run with a new one (ie it'd overwrite old checkpoints, etc if...
@rom1504 yup, done and working quite nicely with lots of testing on both stability and juwels over the past week and a bit. I tested extensively with tensorboard enabled. I...
@Quan-Sun @gabrielilharco I think the changes are reasonable, I use layer decay extensively in timm fine-tuning these days, and I feel it'd be useful here, especially when initializing one or...
FWIW, timm's LD impl is here https://github.com/rwightman/pytorch-image-models/blob/e98c93264cde1657b188f974dc928b9d73303b18/timm/optim/optim_factory.py#L92-L153 ... all models have a fn that returns the group metadata, and the grouper fn can be used, so that would be basis...
We should probably let the dust settle on the major PRs being added right now, see what their demands will be in terms of train code and modelling structure before...