llama-cookbook icon indicating copy to clipboard operation
llama-cookbook copied to clipboard

ValueError: Cannot flatten integer dtype tensors

Open humza-sami opened this issue 2 years ago • 8 comments

System Info

pytorch 2.1.0+cu121 4xA4000 GPUs

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

🐛 Describe the bug

I am trying to run examples/finetuning.py script without any changes but its giving me following error.

Command:

torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --model_name ../CodeLlama-7b-Instruct/hug --use_peft --peft_method lora --use_fp16 --output_dir ../output --enable_fsdp

results:

Error logs

trainable params: 4,194,304 || all params: 6,742,740,992 || trainable%: 0.06220473254091146                                                                                                                        
Traceback (most recent call last):                                                                                                                                                                                 
 File "examples/finetuning.py", line 8, in <module>                                                                                                                                                               
   fire.Fire(main)                                                                                                                                                                                                
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 141, in Fire                                                                                                             
   component_trace = _Fire(component, args, parsed_flag_args, context, name)                                                                                                                                      
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire                                                                                                            
   component, remaining_args = _CallAndUpdateTrace(                                                                                                                                                               
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace                                                                                              
   component = fn(*varargs, **kwargs)                                                                                                                                                                             
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/llama_recipes/finetuning.py", line 144, in main                                                                                              
   model = FSDP(                                                                                                                                                                                                  
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 463, in __init__                                                                
   _auto_wrap(                                                                                                                                                                                                    
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap                                                                              
   _recursive_wrap(**recursive_wrap_kwargs, **root_kwargs)  # type: ignore[arg-type]                                                                                                                              
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap                                                                                
   wrapped_child, num_wrapped_params = _recursive_wrap(                                                                                                                                                           
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap                                                                                
   wrapped_child, num_wrapped_params = _recursive_wrap(                                                                                                                                                           
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap                                                                                
   wrapped_child, num_wrapped_params = _recursive_wrap(                                                                                                                                                           
 [Previous line repeated 2 more times]                                                                                                                                                                            
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 555, in _recursive_wrap                                                                                
   return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel                                                                                                                                                  
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 484, in _wrap                                                                                          
   return wrapper_cls(module, **kwargs)                                                                                                                                                                           
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 487, in __init__                                                                
   _init_param_handle_from_module(                                                                                                                                                                                
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 519, in _init_param_handle_from_module                                                          
   _init_param_handle_from_params(state, managed_params, fully_sharded_module)                                                                                                                                    
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/_init_utils.py", line 531, in _init_param_handle_from_params                                                          
   handle = FlatParamHandle(                                                                                                                                                                                      
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 537, in __init__                                                                                 
   self._init_flat_param_and_metadata(                                                                                                                                                                            
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 585, in _init_flat_param_and_metadata                                                            
   ) = self._validate_tensors_to_flatten(params)
 File "/root/primisai/vast_ai_envoirment/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 720, in _validate_tensors_to_flatten
   raise ValueError("Cannot flatten integer dtype tensors")
ValueError: Cannot flatten integer dtype tensors

Expected behavior

I am expecting model training without any issue since I have not changed anything

humza-sami avatar Oct 07 '23 23:10 humza-sami

@HamidShojanazeri Please check

humza-sami avatar Oct 07 '23 23:10 humza-sami

@Humza1996 will take a look however, we haven't tested code llama fine-tuning in the recipes yet, so not sure if would work out of the box.

HamidShojanazeri avatar Oct 08 '23 04:10 HamidShojanazeri

Facing same error

lihkinVerma avatar Oct 28 '23 01:10 lihkinVerma

Seem to be related to bitsandbytes, turn off load_in_4bit or load_in_8bit and seem to be working correctly

vTuanpham avatar Nov 03 '23 15:11 vTuanpham

Facing same error

Flemington8 avatar May 20 '24 09:05 Flemington8

Hi! It seems that the FSDP work with Qlora now. While we are working to add more documents about this soon, for now, please check the example script here.

wukaixingxp avatar Jun 03 '24 18:06 wukaixingxp

@Flemington7 , the code llama has not been tested but for one thing, wonder if you are running into same issue with --pure_bf16 ? BTW just to note if you are looking for code assistant/ generation applications, llama3 by itself is very performant in that space, you won't need code llama for this case. Infilling and code completion still requires code llama.

HamidShojanazeri avatar Jun 03 '24 18:06 HamidShojanazeri

Hi! It seems that the FSDP work with Qlora now. While we are working to add more documents about this soon, for now, please check the example script here.

I got same issue though using the script, run_peft_qlora_fsdp.sh. Which parameter do I need to change to make it work?

ghost avatar Jun 04 '24 08:06 ghost

Closing as this was solved here please feel free to re-open if you have any questions or a different issue.

Thanks!

init27 avatar Aug 19 '24 17:08 init27