formath

www.mathmach.com

Shanghai, China machine learning and its applications

Results 16 comments of


                                            formath

AttributeError: 'DeepSpeedHybridEngine' object has no attribute 'mp_group' in step 3.

> I'll add more details on my experiment: when I'm training 13b, it'll exit raising the aforementioned exception. Commandline (I've finished step1 and step2): > > ``` > $ python3...

[Question] 预训练时间和预训练数据

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

[Question] 预训练时间和预训练数据

> > 推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。 > > 看配置好像是纯data parallel，没有开tensor parallel吗？猜测应该开了tensor和pipeline并行，否则很难达到0.58利用率

[Question] 请教下原始训练数据集是否开源？

毕竟是要商业化的公司，核心东西不太可能开源

[Question] 模型显存占用28G？

7B模型不就应该这么大吗

tensor not found when using tf.estimator.WarmStartSettings

@candyzone `userid_embedding ` is a partitioned `ev` embedding. Other variables have no problem. I guess the partitioned variables need a special logic. ``` Traceback (most recent call last): File "prerank_debias.py",...

‹
1
2