Quan Sun comments

Results 23 comments of


                                            Quan Sun

support layer decay and different lr for text/visual encoder

p.s. bsz can achieve 57k when using grad checkpoint & deepspeed fp16 & zero-stage-1 & local loss

support layer decay and different lr for text/visual encoder

@gabrielilharco @rwightman Thanks for your comments. I will work on these changes ASAP.

support layer decay and different lr for text/visual encoder

Hi @gabrielilharco. You are right. get_num_layer_for_transformer(...) is not flexible. It should warn users if the models are not supported. Do you think we can have a white list here? For...

support layer decay and different lr for text/visual encoder

Hi @gabrielilharco. have checked "Allow edits from maintainers." on my side. Please let me know if anything was missed.

Add support for gradient accumulation.

Cool! Is this an implementation of GradAccum in [BASIC](https://arxiv.org/pdf/2111.10050.pdf)?

add deepspeed zero-stage-1

Just a follow-up. Is anyone taking a look?

override the default patch dropout value in 'vision_cfg'

> @Quan-Sun oh this is my error! thanks for the fix! never mind!

Hi @rom1504 Deepspeed may be another option for bigger models. It's also easy and effective to use(PR for this https://github.com/mlfoundations/open_clip/pull/264). It can be applied with older versions of Pytorch, such...

何时支持中文？

我们正在准备中英双语的版本，后面训练完成会第一时间更新。