Random comments

Results 6 comments of


                                            Random

请问能在megatron基础上使用吗

nv的megatron-lm训练框架我们没有适配，目前是适配了fairseq和transformers，如果是megatron-lm的训练框架，需要进行模型转换。

请问能在megatron基础上使用吗

适配分布式训练指的是训练的时候使用EET吗？这个不行，EET是一个推理引擎，不支持反向传播。

DDP: why does every process allocate memory of GPU 0 and how to avoid it?

Maybe you use the torch.load() without 'map_location=lambda storage, loc: storage'. The original checkpoint saved the tensor on different GPUs, then the torch.load() will also create another process to map the...

Provide an Example for Inference

I recommend you to use the Easy and Efficient Transformer(EET) for inference.

Sub-workers exits without messages

@mahnerak I solved this by add num_workers=0. It seems like a bug from pytorch !

Sub-workers exits without messages

@mahnerak ， did you solve it ?