He Jia
He Jia
Please just use LlamaModel = GPTModel. For now Megatron have fully support Llama. I may release my new framework code in the future.
1、我用的是1.06,可以自己把flash-attention编译一遍,如果懒得的话就用Nvidia官方的NGC pytorch镜像。 2、不正常,增量训练后我这边也是2.x,检查一下模型参数和训练参数是否对齐。实在不行就直接在module的call函数中打印每层的输出看是否和原生的匹配。 3、多谢,我忘了这个了。
抱歉关于数据量等模型详细信息我不能回答你,但数据量很多很多。 而增量预训练模型的效果在各种评价数据集上都是比较可以的。
从 assert srcIndex < srcSelectDimSize看应该是数据输入源的问题,你可以自己写一个空的数据迭代器来调试试试,写法和pytorch一样
把--sequence-parallel、--recompute-activations、--use-cpu-initialization、--use-distributed-optimizer都打开还是超过显存容量的话,试着调整PP、TP参数。
You could write a new custom_pretrain_llama.py to add HF tokenizer in training step. Add it in build_train_iterable_loaders function or somewhere else.
I am a developer of tensorflow [recommenders-addons](https://github.com/tensorflow/recommenders-addons) and I now need to develop an all-to-all embedding layer for multi-GPU distributed training of recommendation models. The old tensorflow distributed strategy clearly...
> > online inference services and the functional components used by various recommendation algorithms > > @MoFHeka Can you elaborate on what you need here? @jeffcarp If a third-party custom...
Thank you for your reply. Here is the tensorflow recommenders addons which store and train dynamic shape embedding tables with fully functional hashtable. It’s designed for training ID feature without...
Any progress or roadmap? @sachinprasadhs @fchollet