aitextgen icon indicating copy to clipboard operation
aitextgen copied to clipboard

Common Imports Fix and Readme Update to fix RuntimeError in trainer.fit()

Open scorixear opened this issue 2 years ago • 7 comments

Running example code with current package creates following errors:

  • cannot import name 'DeepSpeedPlugin' from 'pytorch_lightning.plugins - aitextgen.py line 14
  • cannot import name 'ProgressBarBase' from 'pytorch_lightning.callbacks.progress - train.py line 13
  • cannot import name '_TPU_AVAILABLE' from 'pytorch_lightning.utilities - train.py line 14 - fixed in #202
  • Runtime Error: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. - aitextgen.py line 752

The Runtime error suggests wrapping the user code in a main function as hinted here https://discuss.pytorch.org/t/runtimeerror-an-attempt-has-been-made-to-start-a-new-process-before-the-current-process-has-finished-its-bootstrapping-phase/145462

But I cannot confirm if this fixes the issue as the current code does not progress at all (Might also because ProgressBar is not the correct replacement for ProgressBarBase.

Would love to have your input if theses changes actually work!

scorixear avatar Mar 24 '23 10:03 scorixear

After around 1 Hour of training the program finished correctly, although the progress bar seems to be broken grafik

scorixear avatar Mar 24 '23 11:03 scorixear

Getting this error while executing the example image

vjarora1978 avatar Mar 25 '23 16:03 vjarora1978

Getting this error while executing the example image

yes I get the same error, I will investigate whats up

scorixear avatar Mar 25 '23 16:03 scorixear

Getting this error while executing the example image

seems like ProgressBarBase contained the "loss" tensor for version 1.8.6, but got removed in ProgressBar version 2.0.0 (the latest of pytorch lightning)

I replaced the metrics with the outputs loss value - this doesn't affect the training code at all, its just about the progress bar viewing current and average loss

scorixear avatar Mar 25 '23 16:03 scorixear

this is a really helpful pull req, thanks a lot! however, i still get an error about the kwarg "gpus" being unkown in pytorch's argsparse.py? "gpus" seemed to be part of that trainer object thing in train.py, could you help?

TypeError                                 Traceback (most recent call last)
<ipython-input-11-341925ca7a1c> in <cell line: 1>()
----> 1 ai.train(file_name,
      2          line_by_line=False,
      3          from_cache=False,
      4          num_steps=3000,
      5          generate_every=300,

1 frames
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py in insert_env_defaults(self, *args, **kwargs)
     67 
     68         # all args were already moved to kwargs
---> 69         return fn(self, **kwargs)
     70 
     71     return cast(_T, insert_env_defaults)

TypeError: Trainer.__init__() got an unexpected keyword argument 'gpus'

rs-rud avatar May 02 '23 14:05 rs-rud

@fictionFanKazuki

this is a really helpful pull req, thanks a lot! however, i still get an error about the kwarg "gpus" being unkown in pytorch's argsparse.py? "gpus" seemed to be part of that trainer object thing in train.py, could you help?

TypeError                                 Traceback (most recent call last)
<ipython-input-11-341925ca7a1c> in <cell line: 1>()
----> 1 ai.train(file_name,
      2          line_by_line=False,
      3          from_cache=False,
      4          num_steps=3000,
      5          generate_every=300,

1 frames
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py in insert_env_defaults(self, *args, **kwargs)
     67 
     68         # all args were already moved to kwargs
---> 69         return fn(self, **kwargs)
     70 
     71     return cast(_T, insert_env_defaults)

TypeError: Trainer.__init__() got an unexpected keyword argument 'gpus'

Hm, not sure how to reproduce this. I have changed the "gpus" arguments to"num_nodes" in my latest commit. Maybe you haven't used the latest one there?

Otherwise there is probably a new version of pytorch_lightning that had more breaking changes. But i would need to know which version you have installed there and/or the full stack trace as I canot deciver where the utilities function was called from.

On my machine with my version of pytorch_lightning (2.0.0) it works. I will push a restricted requirements.txt shortly

scorixear avatar May 07 '23 21:05 scorixear

Thanks for this! I merged these fixes into my custom fork of AITextGen, and it allowed me to upgrade to PL v2.0.4 successfully!

Vectorrent avatar Jul 03 '23 02:07 Vectorrent