He Jia comments

Results 87 comments of


                                            He Jia

Have you backed up the Megatron-LM version you use?

Please just use LlamaModel = GPTModel. For now Megatron have fully support Llama. I may release my new framework code in the future.

关于LLAMA pretrain相关问题。谢谢～

1、我用的是1.06，可以自己把flash-attention编译一遍，如果懒得的话就用Nvidia官方的NGC pytorch镜像。 2、不正常，增量训练后我这边也是2.x，检查一下模型参数和训练参数是否对齐。实在不行就直接在module的call函数中打印每层的输出看是否和原生的匹配。 3、多谢，我忘了这个了。

关于LLAMA pretrain相关问题。谢谢～

抱歉关于数据量等模型详细信息我不能回答你，但数据量很多很多。而增量预训练模型的效果在各种评价数据集上都是比较可以的。

请问 output = torch.matmul(total_input, weight.t()) 此处报错！！！

从 assert srcIndex < srcSelectDimSize看应该是数据输入源的问题，你可以自己写一个空的数据迭代器来调试试试，写法和pytorch一样

With model parallel still OOM on A100-40G

把--sequence-parallel、--recompute-activations、--use-cpu-initialization、--use-distributed-optimizer都打开还是超过显存容量的话，试着调整PP、TP参数。

Could you please provide some details about tokenizer between Megatron-lm and HF tokenizer?

You could write a new custom_pretrain_llama.py to add HF tokenizer in training step. Add it in build_train_iterable_loaders function or somewhere else.

I am a developer of tensorflow [recommenders-addons](https://github.com/tensorflow/recommenders-addons) and I now need to develop an all-to-all embedding layer for multi-GPU distributed training of recommendation models. The old tensorflow distributed strategy clearly...

He Jia

Have you backed up the Megatron-LM version you use?

关于LLAMA pretrain相关问题。谢谢～

关于LLAMA pretrain相关问题。谢谢～

请问 output = torch.matmul(total_input, weight.t()) 此处报错！！！

With model parallel still OOM on A100-40G

Could you please provide some details about tokenizer between Megatron-lm and HF tokenizer?

🗺️ Keras Development Roadmap

🗺️ Keras Development Roadmap

🗺️ Keras Development Roadmap

DTensor support in TF backend