Xiangning Chen
Xiangning Chen
> I would not have thought of modifying temperature, how did you think of this? I tracked the temperature value throughout the training, and found that Lion learns this value...
@lucidrains I used an initial temperature 10 in the paper. But in LiT and BASIC, the vision tower is loaded from a pre-trained ckpt and is fixed during training, while...
@mitchellnw I further validate on the large and giant size CLIP, each with 20K steps (10K steps warmup then cosine decay) and `initial temperature = 30.0`. When training g/14, I...
Hi, Can you try the 'DrNAS_imagenet' and see the result? There are some randomness, but should not be that much.
Hi, Thanks for your interest, which pytorch version are you using?
I suppose that the version should not cause this issue, as we can get the CIFAR-100 result with 0 variance. Could you also try train_search_progressive.py?
Hi, Thanks for your interest. I used 8 1080ti for training on ImageNet, probably you can try to reduce the batch size and tune the learning rate accordingly. best
For momentum tracking with bfloat16, you can just cast the type of momentum into bfloat16, see the example here: https://github.com/deepmind/optax/blob/master/optax/_src/transform.py#L461.
Thanks for the data point. Can you please tell us your baseline settings? Like which optimizer, which learning rate and weight decay? I think for instability problem, you can either...
> I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations: > > * Models of different sizes 0.2B,...