llama-cookbook
llama-cookbook copied to clipboard
ValueError: Cannot flatten integer dtype tensors
System Info
pytorch 2.1.0+cu121 4xA4000 GPUs
Information
- [ ] The official example scripts
- [ ] My own modified scripts
🐛 Describe the bug
I am trying to run examples/finetuning.py script without any changes but its giving me following error.
Command:
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --model_name ../CodeLlama-7b-Instruct/hug --use_peft --peft_method lora --use_fp16 --output_dir ../output --enable_fsdp
results:
Error logs
trainable params: 4,194,304 || all params: 6,742,740,992 || trainable%: 0.06220473254091146
Traceback (most recent call last):
File "examples/finetuning.py", line 8, in <module>
fire.Fire(main)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/llama_recipes/finetuning.py", line 144, in main
model = FSDP(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 463, in __init__
_auto_wrap(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap
_recursive_wrap(**recursive_wrap_kwargs, **root_kwargs) # type: ignore[arg-type]
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
[Previous line repeated 2 more times]
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 555, in _recursive_wrap
return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 484, in _wrap
return wrapper_cls(module, **kwargs)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 487, in __init__
_init_param_handle_from_module(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 519, in _init_param_handle_from_module
_init_param_handle_from_params(state, managed_params, fully_sharded_module)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 531, in _init_param_handle_from_params
handle = FlatParamHandle(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 537, in __init__
self._init_flat_param_and_metadata(
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 585, in _init_flat_param_and_metadata
) = self._validate_tensors_to_flatten(params)
File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 720, in _validate_tensors_to_flatten
raise ValueError("Cannot flatten integer dtype tensors")
ValueError: Cannot flatten integer dtype tensors
Expected behavior
I am expecting model training without any issue since I have not changed anything
@HamidShojanazeri Please check
@Humza1996 will take a look however, we haven't tested code llama fine-tuning in the recipes yet, so not sure if would work out of the box.
Facing same error
Seem to be related to bitsandbytes, turn off load_in_4bit or load_in_8bit and seem to be working correctly
Facing same error
Hi! It seems that the FSDP work with Qlora now. While we are working to add more documents about this soon, for now, please check the example script here.
@Flemington7 , the code llama has not been tested but for one thing, wonder if you are running into same issue with --pure_bf16 ? BTW just to note if you are looking for code assistant/ generation applications, llama3 by itself is very performant in that space, you won't need code llama for this case. Infilling and code completion still requires code llama.
Hi! It seems that the FSDP work with Qlora now. While we are working to add more documents about this soon, for now, please check the example script here.
I got same issue though using the script, run_peft_qlora_fsdp.sh. Which parameter do I need to change to make it work?
Closing as this was solved here please feel free to re-open if you have any questions or a different issue.
Thanks!