Jack BAI

Results 37 comments of Jack BAI

我单机四块V100,用的DistributedDataParallel+Apex,10秒一个epoch,5m的语料库(0.1B)一小时基本拟合完成.

这个项目主要是PyTorch做的叭,可以说下哪里用到了tf吗?

Thanks a lot for your contribution. Would you like to provide snippet samples for using the hidden states - specifically, what does the returned `hidden_states` vector contain?

Just figured it out - so the hidden states output vector is a **concatenation** of all the hidden states at the last layer. From the functional aspect I would strongly...

Thanks for the fix. I also find that `return_hidden_states=True` makes the GPU usages keeps going up when using your patch and do `llm.generate`. I guess it can be solved by...