NeMo
NeMo copied to clipboard
Regarding the ready to use .nemo models for PEFT finetuning
I am planning to finetune the llama model using the PEFT technique according to this official documentation https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/llama2peft.html However, I am facing some issues when I convert the huggingface model to .nemo format.
So is there any repository of ready-to-use .nemo models(llama.nemo, mistral.nemo, openchat.nemo....etc) that are compatible with PEFT training steps mentioned in this official documentation?
If so I can skip the huggingface to .nemo conversion step and resume the next steps of fine-tuning @okuchaiev
I'm having the same problem for mistral 7B PEFT when running this command:
python3 /opt/NeMo/scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py --input_name_or_path=/workspace/mistral-7B-hf --output_path=mistral.nemo
This is the error:
[NeMo I 2024-04-02 17:52:08 convert_mistral_7b_hf_to_nemo:149] loading checkpoint 1: /workspace/mistral-7B-hf in_dir: /workspace/mistral-7B-hf Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Traceback (most recent call last): File "/opt/NeMo/scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py", line 339, in <module> convert(args) File "/opt/NeMo/scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py", line 151, in convert model_args, ckpt, tokenizer = load_mistral_ckpt(args.input_name_or_path) File "/opt/NeMo/scripts/checkpoint_converters/convert_mistral_7b_hf_to_nemo.py", line 140, in load_mistral_ckpt model = AutoModelForCausalLM.from_pretrained(in_dir) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3671, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4078, in _load_pretrained_model state_dict = load_state_dict(shard_file, is_quantized=is_quantized) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 507, in load_state_dict with safe_open(checkpoint_file, framework="pt") as f: FileNotFoundError: No such file or directory: "/workspace/mistral-7B-hf/model-00001-of-00002.safetensors"
I also tried to do fine tuning of llama2 7B but it gave me the same error, I cannot convert the checkpoints to .nemo format because the checkpoints cannot be loaded
@frankh077
Could you able to run the docker command
docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/ea-bignlp/ga-participants/nemofw-training:23.08.03 bash
specified in the official documentation? - https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/llama2peft.html#step-2-convert-to-nemo
@pradeepdev-1995
Yes but through the nemo-framework-training
container, this is the command I used:
docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvaie/nemo-framework-training:23.08.03 bash
is this container setup is mandatory? @frankh077 Shall we do the fine-tuning directly in the Python console without using the container?
I think it is mandatory since the environment and the necessary tools are in the container, but if you can build the environment it should work, you can base on the NeMo containers available through the NGC
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.