unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora'

Open dsbyprateekg opened this issue 1 year ago • 23 comments

Hi,

I am trying to run Alpaca + Gemma 7b full example.ipynb in Kaggle environment and getting following error- image

while running the below code- image

Installed libraries versions are: langchain-0.1.9, langchain-community-0.0.24, langchain-core-0.1.27, sentence-transformers-2.4.0 Please have a look at this issue.

dsbyprateekg avatar Feb 28 '24 11:02 dsbyprateekg

Just encountered the same error on Colab. Seems to be a new issue

Jonaskouwenhoven avatar Feb 28 '24 12:02 Jonaskouwenhoven

Just downgrade HF PEFT to 0.8.2 until unsloth team fix new DORA support form HF PEFT.

!pip install --force-reinstall --no-cache-dir peft==0.8.2

DeanChugall avatar Feb 28 '24 12:02 DeanChugall

Oh my I will get this fixed ASAP

danielhanchen avatar Feb 28 '24 12:02 danielhanchen

Yeah, it's because HuggingFace just merged their DoRA branch to main in the last days. Probably that new argument is slipping through.

RonanKMcGovern avatar Feb 28 '24 12:02 RonanKMcGovern

It would be great if we could integrate PEFT internally in Unsloth to prevent these reverse breaking changes in external packages.

DeanChugall avatar Feb 28 '24 12:02 DeanChugall

Thanks @RonanKMcGovern for sending me here.

Let's set up CI using PEFT and unsloth main to prevent this in the future. Do you want to set it up on your side or should we look into adding it to PEFT?

Regarding this specific error, if possible, add **kwargs to the method so that future additions won't lead to the same kind of error.

BenjaminBossan avatar Feb 28 '24 12:02 BenjaminBossan

@BenjaminBossan Should be fine in the future hopefully - I rewrote the code to use inspect.getsource to patch it internally :) I used to have 1 custom function, but now its dynamic patching

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

Doing some tests on my end and will push it asap!! Sorry everyone for the issue and also thanks for notifying me!

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@DeanChugall @dsbyprateekg @Jonaskouwenhoven Again sorry - just fixed it!! On Kaggle / Colab, a reinstall of Unsloth will have to take place - no need to disconnect - just press restart and run all.

For local machines: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Again sorry and also thanks for notifying me!!

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@danielhanchen Thanks a lot for the quick response and the fix. It's working but facing another error- image

ValueError: Invalid pattern: '**' can only be an entire path component

Can you please check and help me to resolve this as well?

dsbyprateekg avatar Feb 28 '24 13:02 dsbyprateekg

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

It's my bad, I forgot to attach the logs. Please find attached the complete logs of the error- logs_kaggle.txt

dsbyprateekg avatar Feb 28 '24 13:02 dsbyprateekg

@dsbyprateekg Is ur Kaggle instance connected to the internet?

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@dsbyprateekg Is ur Kaggle instance connected to the internet?

Yes.

dsbyprateekg avatar Feb 28 '24 13:02 dsbyprateekg

Hmm weird bug indeed

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@dsbyprateekg Oh try pip install --upgrade datasets I might have to change the datasets version

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@DeanChugall Thanks again! It solved my issue and I am able to proceed.

dsbyprateekg avatar Feb 28 '24 13:02 dsbyprateekg

@dsbyprateekg Oh the datasets issue is fine as well? Also I'll reopen this temporarily for people who might have the same issue!! I'll close this in a few days :)

danielhanchen avatar Feb 28 '24 13:02 danielhanchen

@danielhanchen Yes, datasets issue was also resolved. But now facing another error- TypeError: '>' not supported between instances of 'NoneType' and 'int'

While running the training command- `from trl import SFTTrainer from transformers import TrainingArguments

trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 2, packing = False, # Can make training 5x faster for short sequences. args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs=1, max_steps = None, learning_rate = 2e-4, fp16 = not torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, output_dir = "outputs", ), )`

Logs are attached. logs_kaggle.txt

dsbyprateekg avatar Feb 28 '24 14:02 dsbyprateekg

so the issue is resolved once I commented the line max_steps = None.

The next error is with command trainer_stats = trainer.train() and it's related to wandb logon. Although I have not used it anywhere in the code. It seems it is picking up internally. `UsageError Traceback (most recent call last) Cell In[11], line 1 ----> 1 trainer_stats = trainer.train()

File /opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:331, in SFTTrainer.train(self, *args, **kwargs) 328 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune: 329 self.model = self._trl_activate_neftune(self.model) --> 331 output = super().train(*args, **kwargs) 333 # After training we make sure to retrieve back the original forward pass method 334 # for the embedding layer by removing the forward post hook. 335 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1622 hf_hub_utils.enable_progress_bars() 1623 else: -> 1624 return inner_training_loop( 1625 args=args, 1626 resume_from_checkpoint=resume_from_checkpoint, 1627 trial=trial, 1628 ignore_keys_for_eval=ignore_keys_for_eval, 1629 )

File :272, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:370, in CallbackHandler.on_train_begin(self, args, state, control) 368 def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl): 369 control.should_training_stop = False --> 370 return self.call_event("on_train_begin", args, state, control)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:414, in CallbackHandler.call_event(self, event, args, state, control, **kwargs) 412 def call_event(self, event, args, state, control, **kwargs): 413 for callback in self.callbacks: --> 414 result = getattr(callback, event)( 415 args, 416 state, 417 control, 418 model=self.model, 419 tokenizer=self.tokenizer, 420 optimizer=self.optimizer, 421 lr_scheduler=self.lr_scheduler, 422 train_dataloader=self.train_dataloader, 423 eval_dataloader=self.eval_dataloader, 424 **kwargs, 425 ) 426 # A Callback can skip the return of control if it doesn't change it. 427 if result is not None:

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:767, in WandbCallback.on_train_begin(self, args, state, control, model, **kwargs) 765 args.run_name = None 766 if not self._initialized: --> 767 self.setup(args, state, model, **kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:740, in WandbCallback.setup(self, args, state, model, **kwargs) 737 init_args["name"] = args.run_name 739 if self._wandb.run is None: --> 740 self._wandb.init( 741 project=os.getenv("WANDB_PROJECT", "huggingface"), 742 **init_args, 743 ) 744 # add config parameters (run may have been created manually) 745 self._wandb.config.update(combined_dict, allow_val_change=True)

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1195, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings) 1193 if logger is not None: 1194 logger.exception(str(e)) -> 1195 raise e 1196 except KeyboardInterrupt as e: 1197 assert logger

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1172, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings) 1170 try: 1171 wi = _WandbInit() -> 1172 wi.setup(kwargs) 1173 assert wi.settings 1174 except_exit = wi.settings._except_exit

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:306, in _WandbInit.setup(self, kwargs) 303 settings.update(init_settings, source=Source.INIT) 305 if not settings._offline and not settings._noop: --> 306 wandb_login._login( 307 anonymous=kwargs.pop("anonymous", None), 308 force=kwargs.pop("force", None), 309 _disable_warning=True, 310 _silent=settings.quiet or settings.silent, 311 _entity=kwargs.get("entity") or settings.entity, 312 ) 314 # apply updated global state after login was handled 315 wl = wandb.setup()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:317, in _login(anonymous, key, relogin, host, force, timeout, _backend, _silent, _disable_warning, _entity) 314 return logged_in 316 if not key: --> 317 wlogin.prompt_api_key() 319 # make sure login credentials get to the backend 320 wlogin.propogate_login()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:247, in _WandbLogin.prompt_api_key(self) 241 if status == ApiKeyStatus.NOTTY: 242 directive = ( 243 "wandb login [your_api_key]" 244 if self._settings._cli_only_mode 245 else "wandb.login(key=[your_api_key])" 246 ) --> 247 raise UsageError("api_key not configured (no-tty). call " + directive) 249 self.update_session(key, status=status) 250 self._key = key

UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) `

dsbyprateekg avatar Feb 28 '24 14:02 dsbyprateekg

@dsbyprateekg On wandb:

import os
os.environ["WANDB_DISABLED"] = "true"

then for TrainingArgs:

  seed = 3407,
  output_dir = "outputs",
  report_to = "none",

danielhanchen avatar Feb 28 '24 14:02 danielhanchen

@danielhanchen I have added my wandb login but now I am facing nbclient.exceptions.DeadKernelError: Kernel died error while doing the training using command trainer_stats = trainer.train()

Please check logs and see if you find something wrong here. logs_kaggle.txt

dsbyprateekg avatar Feb 29 '24 04:02 dsbyprateekg

@dsbyprateekg Oh on the topic of Kaggle - would the Mistral notebook we have help? https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook I tested that vigourously, so hopefully that one doesn't have any issues

danielhanchen avatar Feb 29 '24 12:02 danielhanchen