Weikuan Wang comments

Results 9 comments of


                                            Weikuan Wang

Reproduce Results

Hi I got the same result as yours. Did you resolve it? Thanks !!

[QUESTION] vicuna-7b-v1.5 weight conversion from huggingface to megatron-lm format

same issue on different model

Huggingface <-> Megatron-LM Compatibility

> ## Convert llama-2 from HuggingFace to Megatron-LM: > ``` > PYTHONPATH=$(pwd) tools/checkpoint/util.py --model-type=GPT --loader=llama2_hf --load-dir= --save-dir= --tokenizer-model= > ``` > > ## Save llama-2 checkpoint as HuggingFace to Megatron-LM:...

Huggingface <-> Megatron-LM Compatibility

> Hi, are there any updates? I'm mostly interested in converting GPT-2/Bloom checkpoints. They have script for converting GPT-2 somewhere in hf's repo transformers/models/megatron-gpt2 https://huggingface.co/docs/transformers/model_doc/megatron_gpt2 Otherwise it should be in...

CUDA_VISIBLE_DEVICES isn't correctly inherited on a SLURM system

> same issue here. Any solution? you can add --include localhost:0,1 in DeepSpeed command, in my case, my gpu is 3,4,5,6 in slurm node, so I need to use --include...

CUDNN_STATUS_EXECUTION_FAILED while training VHRED

Hi I'm facing the same problem, do you have any idea about it ? Thanks

CUDNN_STATUS_EXECUTION_FAILED while training VHRED

Hi,I fixed the problem Setting: RTX2080 , CUDA9.0, pytorch 0.4.1 with cuda9.0 ( use a docker image from the pytorch official docker hub) Solution: set torch.backends.cudnn.enabled=False in the code, then...

How to convert Llama-2 huggingface checkpoint to the megatron format

try pip uninstall transformer-engine

Evaluate on long context (32k,64k etc..) on 30B/70B large models

> > Hi, > > I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a...