Weikuan Wang
Weikuan Wang
Hi I got the same result as yours. Did you resolve it? Thanks !!
same issue on different model
> ## Convert llama-2 from HuggingFace to Megatron-LM: > ``` > PYTHONPATH=$(pwd) tools/checkpoint/util.py --model-type=GPT --loader=llama2_hf --load-dir= --save-dir= --tokenizer-model= > ``` > > ## Save llama-2 checkpoint as HuggingFace to Megatron-LM:...
> Hi, are there any updates? I'm mostly interested in converting GPT-2/Bloom checkpoints. They have script for converting GPT-2 somewhere in hf's repo transformers/models/megatron-gpt2 https://huggingface.co/docs/transformers/model_doc/megatron_gpt2 Otherwise it should be in...
> same issue here. Any solution? you can add --include localhost:0,1 in DeepSpeed command, in my case, my gpu is 3,4,5,6 in slurm node, so I need to use --include...
Hi I'm facing the same problem, do you have any idea about it ? Thanks
Hi,I fixed the problem Setting: RTX2080 , CUDA9.0, pytorch 0.4.1 with cuda9.0 ( use a docker image from the pytorch official docker hub) Solution: set torch.backends.cudnn.enabled=False in the code, then...
try pip uninstall transformer-engine
> > Hi, > > I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a...