Xiangning Chen comments

Results 21 comments of


                                            Xiangning Chen

[WIP] Testing the lion optimizer

> I would not have thought of modifying temperature, how did you think of this? I tracked the temperature value throughout the training, and found that Lion learns this value...

[WIP] Testing the lion optimizer

@lucidrains I used an initial temperature 10 in the paper. But in LiT and BASIC, the vision tower is loaded from a pre-trained ckpt and is fixed during training, while...

[WIP] Testing the lion optimizer

@mitchellnw I further validate on the large and giant size CLIP, each with 20K steps (10K steps warmup then cosine decay) and `initial temperature = 30.0`. When training g/14, I...

Reproducing the results

Hi, Can you try the 'DrNAS_imagenet' and see the result? There are some randomness, but should not be that much.

Reproducing nb201 results

Hi, Thanks for your interest, which pytorch version are you using?

Reproducing nb201 results

I suppose that the version should not cause this issue, as we can get the CIFAR-100 result with 0 variance. Could you also try train_search_progressive.py?

ImageNet training gets "Killed" response

Hi, Thanks for your interest. I used 8 1080ti for training on ImageNet, probably you can try to reduce the batch size and tune the learning rate accordingly. best

how to train model by lion optimizer with fp16?

For momentum tracking with bfloat16, you can just cast the type of momentum into bfloat16, see the example here: https://github.com/deepmind/optax/blob/master/optax/_src/transform.py#L461.

Question about Lion

Thanks for the data point. Can you please tell us your baseline settings? Like which optimizer, which learning rate and weight decay? I think for instability problem, you can either...

Always getting NaNs in long training

> I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations: > > * Models of different sizes 0.2B,...