qlora
qlora copied to clipboard
accelerate does not support multi-gpu load int-8bit
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.
ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices
same error here
same issue here.
same error here
same error
https://github.com/huggingface/accelerate/issues/1515#issuecomment-1577151399 Solution to the problem
https://github.com/huggingface/accelerate/pull/1523 being merged, if you uninstall accelerate
and reinstall it from source:
pip install git+https://github.com/huggingface/accelerate.git
it should be fixed
I install the latest version of acclerate:
pip install git+https://github.com/huggingface/accelerate.git
load the model :
device_map = 'auto'
device_map = {0: '30000MB', 1: '30000MB', 2: '30000MB', 3: '30000MB'}
model = BloomForCausalLM.from_pretrained(
args.model_name_or_path,
device_map=device_map,
max_memory=max_memory,
load_in_4bit=True,
torch_dtype=torch.float16,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
),
)
I run the code:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py
but it still throw the error:
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1756, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1182, in prepare
result = tuple(
File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1258, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
Looking forward to you answer, thanks @younesbelkada
I install the latest version of acclerate:
pip install git+https://github.com/huggingface/accelerate.git
load the model :
device_map = 'auto' device_map = {0: '30000MB', 1: '30000MB', 2: '30000MB', 3: '30000MB'} model = BloomForCausalLM.from_pretrained( args.model_name_or_path, device_map=device_map, max_memory=max_memory, load_in_4bit=True, torch_dtype=torch.float16, quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, ), )
I run the code:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py
but it still throw the error:
File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1756, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1182, in prepare result = tuple( File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1183, in <genexpr> self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 1258, in prepare_model raise ValueError( ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
Looking forward to you answer, thanks @younesbelkada
I use 4*V100
Hi @yangjianxin1 It seems there is a mistake in your script, use instead:
device_map = 'auto'
max_memory = {0: '30GB', 1: '30GB', 2: '30GB', 3: '30GB'}
model = BloomForCausalLM.from_pretrained(
args.model_name_or_path,
device_map=device_map,
max_memory=max_memory,
load_in_4bit=True,
torch_dtype=torch.float16,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
),
)
Hi @yangjianxin1 It seems there is a mistake in your script, use instead:
device_map = 'auto' max_memory = {0: '30GB', 1: '30GB', 2: '30GB', 3: '30GB'} model = BloomForCausalLM.from_pretrained( args.model_name_or_path, device_map=device_map, max_memory=max_memory, load_in_4bit=True, torch_dtype=torch.float16, quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, ), )
thanks for your reply, I try the setting, it also throws the same error.
@yangjianxin1 Thanks! What is the model you are trying to fit?
Also can you print the result of model.hf_device_map
after loading the model
thanks for you help, I have solve the problem, ddp_find_unused_parameters=False just like the code: https://github.com/yangjianxin1/Firefly/blob/master/train_qlora.py#L104
updated accelerate from .21 to .23 and got fixed!