Quan Sun

Results 23 comments of Quan Sun

p.s. bsz can achieve 57k when using grad checkpoint & deepspeed fp16 & zero-stage-1 & local loss

@gabrielilharco @rwightman Thanks for your comments. I will work on these changes ASAP.

Hi @gabrielilharco. You are right. get_num_layer_for_transformer(...) is not flexible. It should warn users if the models are not supported. Do you think we can have a white list here? For...

Hi @gabrielilharco. have checked "Allow edits from maintainers." on my side. Please let me know if anything was missed.

Cool! Is this an implementation of GradAccum in [BASIC](https://arxiv.org/pdf/2111.10050.pdf)?

Just a follow-up. Is anyone taking a look?

> @Quan-Sun oh this is my error! thanks for the fix! never mind!

Hi @rom1504 Deepspeed may be another option for bigger models. It's also easy and effective to use(PR for this https://github.com/mlfoundations/open_clip/pull/264). It can be applied with older versions of Pytorch, such...

我们正在准备中英双语的版本,后面训练完成会第一时间更新。

@will-wiki 目前已经准备好了部分数据,正在进行预训练,按目前的进度,顺利的话大概需要1个月的时间