unsloth
unsloth copied to clipboard
kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora'
Hi,
I am trying to run Alpaca + Gemma 7b full example.ipynb in Kaggle environment and getting following error-
while running the below code-
Installed libraries versions are: langchain-0.1.9, langchain-community-0.0.24, langchain-core-0.1.27, sentence-transformers-2.4.0 Please have a look at this issue.
Just encountered the same error on Colab. Seems to be a new issue
Just downgrade HF PEFT to 0.8.2 until unsloth team fix new DORA support form HF PEFT.
!pip install --force-reinstall --no-cache-dir peft==0.8.2
Oh my I will get this fixed ASAP
Yeah, it's because HuggingFace just merged their DoRA branch to main in the last days. Probably that new argument is slipping through.
It would be great if we could integrate PEFT internally in Unsloth to prevent these reverse breaking changes in external packages.
Thanks @RonanKMcGovern for sending me here.
Let's set up CI using PEFT and unsloth main to prevent this in the future. Do you want to set it up on your side or should we look into adding it to PEFT?
Regarding this specific error, if possible, add **kwargs to the method so that future additions won't lead to the same kind of error.
@BenjaminBossan Should be fine in the future hopefully - I rewrote the code to use inspect.getsource to patch it internally :) I used to have 1 custom function, but now its dynamic patching
Doing some tests on my end and will push it asap!! Sorry everyone for the issue and also thanks for notifying me!
@DeanChugall @dsbyprateekg @Jonaskouwenhoven Again sorry - just fixed it!! On Kaggle / Colab, a reinstall of Unsloth will have to take place - no need to disconnect - just press restart and run all.
For local machines: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
Again sorry and also thanks for notifying me!!
@danielhanchen Thanks a lot for the quick response and the fix.
It's working but facing another error-
ValueError: Invalid pattern: '**' can only be an entire path component
Can you please check and help me to resolve this as well?
@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?
@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?
It's my bad, I forgot to attach the logs. Please find attached the complete logs of the error- logs_kaggle.txt
@dsbyprateekg Is ur Kaggle instance connected to the internet?
@dsbyprateekg Is ur Kaggle instance connected to the internet?
Yes.
Hmm weird bug indeed
@dsbyprateekg Oh try pip install --upgrade datasets I might have to change the datasets version
@DeanChugall Thanks again! It solved my issue and I am able to proceed.
@dsbyprateekg Oh the datasets issue is fine as well? Also I'll reopen this temporarily for people who might have the same issue!! I'll close this in a few days :)
@danielhanchen Yes, datasets issue was also resolved. But now facing another error-
TypeError: '>' not supported between instances of 'NoneType' and 'int'
While running the training command- `from trl import SFTTrainer from transformers import TrainingArguments
trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 2, packing = False, # Can make training 5x faster for short sequences. args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs=1, max_steps = None, learning_rate = 2e-4, fp16 = not torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, output_dir = "outputs", ), )`
Logs are attached. logs_kaggle.txt
so the issue is resolved once I commented the line max_steps = None.
The next error is with command trainer_stats = trainer.train() and it's related to wandb logon.
Although I have not used it anywhere in the code. It seems it is picking up internally.
`UsageError Traceback (most recent call last)
Cell In[11], line 1
----> 1 trainer_stats = trainer.train()
File /opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:331, in SFTTrainer.train(self, *args, **kwargs) 328 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune: 329 self.model = self._trl_activate_neftune(self.model) --> 331 output = super().train(*args, **kwargs) 333 # After training we make sure to retrieve back the original forward pass method 334 # for the embedding layer by removing the forward post hook. 335 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1622 hf_hub_utils.enable_progress_bars() 1623 else: -> 1624 return inner_training_loop( 1625 args=args, 1626 resume_from_checkpoint=resume_from_checkpoint, 1627 trial=trial, 1628 ignore_keys_for_eval=ignore_keys_for_eval, 1629 )
File
File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:370, in CallbackHandler.on_train_begin(self, args, state, control) 368 def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl): 369 control.should_training_stop = False --> 370 return self.call_event("on_train_begin", args, state, control)
File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:414, in CallbackHandler.call_event(self, event, args, state, control, **kwargs)
412 def call_event(self, event, args, state, control, **kwargs):
413 for callback in self.callbacks:
--> 414 result = getattr(callback, event)(
415 args,
416 state,
417 control,
418 model=self.model,
419 tokenizer=self.tokenizer,
420 optimizer=self.optimizer,
421 lr_scheduler=self.lr_scheduler,
422 train_dataloader=self.train_dataloader,
423 eval_dataloader=self.eval_dataloader,
424 **kwargs,
425 )
426 # A Callback can skip the return of control if it doesn't change it.
427 if result is not None:
File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:767, in WandbCallback.on_train_begin(self, args, state, control, model, **kwargs) 765 args.run_name = None 766 if not self._initialized: --> 767 self.setup(args, state, model, **kwargs)
File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:740, in WandbCallback.setup(self, args, state, model, **kwargs) 737 init_args["name"] = args.run_name 739 if self._wandb.run is None: --> 740 self._wandb.init( 741 project=os.getenv("WANDB_PROJECT", "huggingface"), 742 **init_args, 743 ) 744 # add config parameters (run may have been created manually) 745 self._wandb.config.update(combined_dict, allow_val_change=True)
File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1195, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings) 1193 if logger is not None: 1194 logger.exception(str(e)) -> 1195 raise e 1196 except KeyboardInterrupt as e: 1197 assert logger
File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1172, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings) 1170 try: 1171 wi = _WandbInit() -> 1172 wi.setup(kwargs) 1173 assert wi.settings 1174 except_exit = wi.settings._except_exit
File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:306, in _WandbInit.setup(self, kwargs) 303 settings.update(init_settings, source=Source.INIT) 305 if not settings._offline and not settings._noop: --> 306 wandb_login._login( 307 anonymous=kwargs.pop("anonymous", None), 308 force=kwargs.pop("force", None), 309 _disable_warning=True, 310 _silent=settings.quiet or settings.silent, 311 _entity=kwargs.get("entity") or settings.entity, 312 ) 314 # apply updated global state after login was handled 315 wl = wandb.setup()
File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:317, in _login(anonymous, key, relogin, host, force, timeout, _backend, _silent, _disable_warning, _entity) 314 return logged_in 316 if not key: --> 317 wlogin.prompt_api_key() 319 # make sure login credentials get to the backend 320 wlogin.propogate_login()
File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:247, in _WandbLogin.prompt_api_key(self) 241 if status == ApiKeyStatus.NOTTY: 242 directive = ( 243 "wandb login [your_api_key]" 244 if self._settings._cli_only_mode 245 else "wandb.login(key=[your_api_key])" 246 ) --> 247 raise UsageError("api_key not configured (no-tty). call " + directive) 249 self.update_session(key, status=status) 250 self._key = key
UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) `
@dsbyprateekg On wandb:
import os
os.environ["WANDB_DISABLED"] = "true"
then for TrainingArgs:
seed = 3407,
output_dir = "outputs",
report_to = "none",
@danielhanchen I have added my wandb login but now I am facing nbclient.exceptions.DeadKernelError: Kernel died error while doing the training using command trainer_stats = trainer.train()
Please check logs and see if you find something wrong here. logs_kaggle.txt
@dsbyprateekg Oh on the topic of Kaggle - would the Mistral notebook we have help? https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook I tested that vigourously, so hopefully that one doesn't have any issues