formath

Results 16 comments of formath

> I'll add more details on my experiment: when I'm training 13b, it'll exit raising the aforementioned exception. Commandline (I've finished step1 and step2): > > ``` > $ python3...

推算一下,7B模型,1.2万亿token,1000张A800,0.58利用率,训练一个epoch是4天左右。

> > 推算一下,7B模型,1.2万亿token,1000张A800,0.58利用率,训练一个epoch是4天左右。 > > 看配置好像是纯data parallel,没有开tensor parallel吗? 猜测应该开了tensor和pipeline并行,否则很难达到0.58利用率

毕竟是要商业化的公司,核心东西不太可能开源

7B模型不就应该这么大吗

@candyzone `userid_embedding ` is a partitioned `ev` embedding. Other variables have no problem. I guess the partitioned variables need a special logic. ``` Traceback (most recent call last): File "prerank_debias.py",...