aitextgen icon indicating copy to clipboard operation
aitextgen copied to clipboard

Training Not Initiating (Windows 10)

Open gldstrrbt opened this issue 5 years ago • 11 comments

OS: Windows 10 GPU: GTX 1060

Everything appears to run fine up until ".train" is hit, then everything comes to a halt.

`[00:00:00] Reading files █████████████████████████████████████████████████████████████████████████ 100 [00:00:01] Tokenize words █████████████████████████████████████████████████████████████████████████ 15057 / 15057 [00:00:00] Count pairs █████████████████████████████████████████████████████████████████████████ 15057 / 15057 [00:00:00] Compute merges █████████████████████████████████████████████████████████████████████████ 4743 / 4743

INFO:aitextgen.tokenizers:Saving aitextgen-vocab.json and aitextgen-merges.txt to the current directory. You will need both files to build the GPT2Tokenizer. INFO:aitextgen:Constructing GPT-2 model from provided config. INFO:aitextgen:Using a custom tokenizer. GPU available: True, used: True INFO:lightning:GPU available: True, used: True No environment variable for node rank defined. Set as 0. WARNING:lightning:No environment variable for node rank defined. Set as 0. CUDA_VISIBLE_DEVICES: [0] INFO:lightning:CUDA_VISIBLE_DEVICES: [0] 0%| | 0/5000 [00:00<?, ?it/s]Traceback (most recent call last): File "", line 1, in Traceback (most recent call last): File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 105, in spawn_main File "shootme.py", line 16, in exitcode = _main(fd) ai.train(data, batch_size=16, num_steps=5000) File "Z:\0__0\0_seo\aitextgen\aitextgen\aitextgen.py", line 563, in train File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 114, in _main trainer.fit(train_model) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py", line 859, in fit prepare(preparation_data) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 225, in prepare self.single_gpu_train(model) _fixup_main_from_path(data['init_main_from_path']) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\distrib_parts.py", line 503, in single_gpu_train File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") self.run_pretrain_routine(model) File "C:\Users_\Anaconda3\envs\aitext\lib\runpy.py", line 263, in run_path File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py", line 1015, in run_pretrain_routine pkg_name=pkg_name, script_name=fname) File "C:\Users_\Anaconda3\envs\aitext\lib\runpy.py", line 96, in _run_module_code self.train() File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 347, in train mod_name, mod_spec, pkg_name, script_name) File "C:\Users_\Anaconda3\envs\aitext\lib\runpy.py", line 85, in _run_code self.run_training_epoch() exec(code, run_globals) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 406, in run_training_epoch File "Z:\0__0\0_seo\aitextgen\shootme.py", line 16, in ai.train(data, batch_size=16, num_steps=5000) enumerate(_with_is_last(train_dataloader)), "get_train_batch" File "Z:\0__0\0_seo\aitextgen\aitextgen\aitextgen.py", line 563, in train File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\profiler\profilers.py", line 64, in profile_iterable trainer.fit(train_model) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py", line 859, in fit value = next(iterator) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 800, in _with_is_last self.single_gpu_train(model) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\distrib_parts.py", line 503, in single_gpu_train it = iter(iterable) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter self.run_pretrain_routine(model) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\trainer.py", line 1015, in run_pretrain_routine return _MultiProcessingDataLoaderIter(self) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init self.train() File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 347, in train w.start() File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\process.py", line 112, in start self.run_training_epoch() self._popen = self._Popen(self) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 406, in run_training_epoch File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py", line 223, in _Popen enumerate(_with_is_last(train_dataloader)), "get_train_batch" File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\profiler\profilers.py", line 64, in profile_iterable return _default_context.get_context().Process._Popen(process_obj) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py", line 322, in _Popen value = next(iterator) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\pytorch_lightning-0.7.6-py3.7.egg\pytorch_lightning\trainer\training_loop.py", line 800, in _with_is_last return Popen(process_obj) it = iter(iterable) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\popen_spawn_win32.py", line 89, in init File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter reduction.dump(process_obj, to_child) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\reduction.py", line 60, in dump return _MultiProcessingDataLoaderIter(self) ForkingPickler(file, protocol).dump(obj) File "C:\Users_\Anaconda3\envs\aitext\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init BrokenPipeError: [Errno 32] Broken pipe w.start() File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\Users_\Anaconda3\envs\aitext\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

0%| | 0/5000 [00:06<?, ?it/s] 0%| | 0/5000 [00:00<?, ?it/s]`

gldstrrbt avatar May 19 '20 00:05 gldstrrbt

My apologies. Forgot to include the script.

from aitextgen import aitextgen
from aitextgen.tokenizers import train_tokenizer
from aitextgen.TokenDataset import TokenDataset
from aitextgen.utils import build_gpt2_config

file_name 	= "input.txt"
vocab_file 	= "aitextgen-vocab.json"
merges_file = "aitextgen-merges.txt"

train_tokenizer(file_name)

data 	= TokenDataset(file_name, vocab_file=vocab_file, merges_file=merges_file, block_size=32)
config 	= build_gpt2_config(vocab_size=5000, max_length=32, dropout=0.0, n_embd=256, n_layer=8, n_head=8)
ai 		= aitextgen(vocab_file=vocab_file, merges_file=merges_file, config=config)

ai.train(data, batch_size=16, num_steps=5000)

gldstrrbt avatar May 19 '20 00:05 gldstrrbt

Wonder if Conda is complicating things.

Can you set num_workers=1 to train()?

minimaxir avatar May 19 '20 00:05 minimaxir

Thanks for the quick response. I just tried setting num_workers=1 on train() and got the same result.

gldstrrbt avatar May 19 '20 01:05 gldstrrbt

Other possibility is that it's a Windows thing (another one of my repos had an issue with subprocesses on Windows which I never found a fix for it)

At the least, it might not be an issue with aitextgen specifically; not sure if there's an easy solution. (need to get a Windows machine to test at some point.)

minimaxir avatar May 19 '20 01:05 minimaxir

Ah okay. I'll continue to troubleshoot and will update here if I come across a solution. Otherwise, I'll give it a go on my Ubuntu machine when I return to it next week.

gldstrrbt avatar May 19 '20 01:05 gldstrrbt

Definitely having a similar issue with the example code on windows. Was looking around SO and was able to move past the required version with the following, but still got the above error with the trivial examples. pip3 install torch===1.4.0 torchvision===0.5.0 -f https://download.pytorch.org/whl/torch_stable.html

PS C:\Users\user\code\gpt2\chat-gen> & C:/Users/user/AppData/Local/Programs/Python/Python37/python.exe c:/Users/user/code/gpt2/chat-gen/generate.py [00:00:01] Reading files █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 100 [00:00:02] Tokenize words █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 32247 / 32247 [00:00:00] Count pairs █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 32247 / 32247 [00:00:00] Compute merges █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 4743 / 4743

INFO:aitextgen.tokenizers:Saving aitextgen-vocab.json and aitextgen-merges.txt to the current directory. You will need both files to build the GPT2Tokenizer. INFO:aitextgen:Constructing GPT-2 model from provided config. INFO:aitextgen:Using a custom tokenizer. GPU available: True, used: True INFO:lightning:GPU available: True, used: True No environment variable for node rank defined. Set as 0. WARNING:lightning:No environment variable for node rank defined. Set as 0. CUDA_VISIBLE_DEVICES: [0] INFO:lightning:CUDA_VISIBLE_DEVICES: [0] 0%| | 0/5000 [00:00<?, ?it/s][[00:00:01] Reading files █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 100 [00:00:02] Tokenize words █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 32247 / 32247 [00:00:00] Count pairs █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 32247 / 32247 [00:00:00] Compute merges █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 4743 / 4743

INFO:aitextgen.tokenizers:Saving aitextgen-vocab.json and aitextgen-merges.txt to the current directory. You will need both files to build the GPT2Tokenizer. INFO:aitextgen:Constructing GPT-2 model from provided config. INFO:aitextgen:Using a custom tokenizer. GPU available: True, used: True INFO:lightning:GPU available: True, used: True No environment variable for node rank defined. Set as 0. WARNING:lightning:No environment variable for node rank defined. Set as 0. CUDA_VISIBLE_DEVICES: [0] INFO:lightning:CUDA_VISIBLE_DEVICES: [0] 0%| | 0/5000 [00:00<?, ?it/s]Traceback (most recent call last): Traceback (most recent call last): File "", line 1, in File "c:/Users/user/code/gpt2/chat-gen/generate.py", line 31, in File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main ai.train(data, batch_size=16, num_steps=5000) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\aitextgen\aitextgen.py", line 563, in train exitcode = _main(fd) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main trainer.fit(train_model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 859, in fit prepare(preparation_data) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare self.single_gpu_train(model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\distrib_parts.py", line 503, in single_gpu_train _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path self.run_pretrain_routine(model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1015, in run_pretrain_routine pkg_name=pkg_name, script_name=fname) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code self.train() File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 347, in train mod_name, mod_spec, pkg_name, script_name) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code self.run_training_epoch() exec(code, run_globals) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 406, in run_training_epoch

File "c:\Users\user\code\gpt2\chat-gen\generate.py", line 31, in enumerate(_with_is_last(train_dataloader)), "get_train_batch"ai.train(data, batch_size=16, num_steps=5000)

File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\profilers.py", line 64, in profile_iterable File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\aitextgen\aitextgen.py", line 563, in train value = next(iterator) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 800, in _with_is_last trainer.fit(train_model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 859, in fit it = iter(iterable) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter self.single_gpu_train(model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\distrib_parts.py", line 503, in single_gpu_train return _MultiProcessingDataLoaderIter(self) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() self.run_pretrain_routine(model) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1015, in run_pretrain_routine self._popen = self._Popen(self) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen self.train() File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 347, in train return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen self.run_training_epoch() File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 406, in run_training_epoch return Popen(process_obj) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in init enumerate(_with_is_last(train_dataloader)), "get_train_batch" File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\profilers.py", line 64, in profile_iterable reduction.dump(process_obj, to_child) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump value = next(iterator) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 800, in _with_is_last ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe it = iter(iterable) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

0%| | 0/5000 [00:08<?, ?it/s] 0%| | 0/5000 [00:00<?, ?it/s]

ConstantineK avatar May 19 '20 05:05 ConstantineK

@minimaxir @gldstrrbt so I tried being a total idiot and read the error message which told me how to fix it. I added if __name__ == '__main__': at the top of my code, indented the rest, and its training right now.

ConstantineK avatar May 19 '20 05:05 ConstantineK

@minimaxir @ConstantineK glad to hear you found a solution for your Windows setup.

I tried your method of adding if __name__ == '__main__': at the top and indenting the lines below. It appears to have affected the output, but unfortunately I'm still getting errors. Not sure what's causing it though.

Below is a list of troubleshoots I tried, but to no avail.

######################################

  • tried creating new envs

  • tried w/ config = GPT2ConfigCPU()

  • tried w/ to_gpu=True set on aitextgen() - ( w/o GPT2ConfigCPU() )

  • relative/absolute paths

    • C:/Users/username/Desktop/test/
    • C:\Users\username\Desktop\test\
    • /Users/username/Desktop/test/
    • \Users\username\Desktop\test\
    • ./
    • filename ( w/o slashed paths )
  • tried envs using both python=3.7.3 and python=3.8.3

  • made working directories on both a secondary drive and local C:

  • num_workers=1 set on train()

  • having thought it was maybe a memory issue, I closed all programs except for sublime text and anaconda prompt to cut down memory usage. Got it down to ~25%. While running script it increased up to around 60%, then threw the error. Machine has 16gb of RAM.

  • after trying @ConstantineK solution the following error began to display after each run:

INFO:aitextgen:Constructing GPT-2 model from provided config.
INFO:aitextgen:Using a custom tokenizer.
WARNING:aitextgen:pytorch_model.bin already exists in /trained_model and will be overwritten!
GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
No environment variable for node rank defined. Set as 0.
WARNING:lightning:No environment variable for node rank defined. Set as 0.
CUDA_VISIBLE_DEVICES: [0]
INFO:lightning:CUDA_VISIBLE_DEVICES: [0]
  0%|                                                                                          | 0/500 [00:00<?, ?it/s]Traceback (most recent call last):
  File "demo.py", line 30, in <module>
    ai.train(data, batch_size=16, num_steps=500)
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\aitextgen-0.1.2-py3.8.egg\aitextgen\aitextgen.py", line 564, in train
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\trainer.py", line 859, in fit
    self.single_gpu_train(model)
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\distrib_parts.py", line 503, in single_gpu_train
    self.run_pretrain_routine(model)
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\trainer.py", line 1015, in run_pretrain_routine
    self.train()
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\training_loop.py", line 347, in train
    self.run_training_epoch()
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\training_loop.py", line 419, in run_training_epoch
    _outputs = self.run_training_batch(batch, batch_idx)
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\training_loop.py", line 638, in run_training_batch
    self.on_batch_end()
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\trainer\callback_hook.py", line 63, in on_batch_end
    callback.on_batch_end(self, self.get_model())
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\aitextgen-0.1.2-py3.8.egg\aitextgen\train.py", line 168, in on_batch_end
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\site-packages\pytorch_lightning-0.7.6-py3.8.egg\pytorch_lightning\core\memory.py", line 270, in get_gpu_memory_map
    result = subprocess.run(
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\_\Anaconda3\envs\plzwork\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
  0%|          | 0/500 [00:23<?, ?it/s]

gldstrrbt avatar May 19 '20 18:05 gldstrrbt

Did you change the paths for the configuration stuff? I did originally and it also blew up fwiw. I changed it back to the defaults and it started working again.

ConstantineK avatar May 19 '20 18:05 ConstantineK

I tried the different path variations for file_name, vocab_file, and merges_file. And the config.json file under the trained_model directory is showing the object below. Also, the config var in the script was set to GPT2ConfigCPU() during some of my troubleshooting runs. Is there another set of configurations that I'm missing?

{
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 64,
  "n_embd": 128,
  "n_head": 4,
  "n_layer": 4,
  "n_positions": 64,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "vocab_size": 5000
}

gldstrrbt avatar May 19 '20 19:05 gldstrrbt

I did a fresh pip install (also have conda), ran into this problem, and @ConstantineK's comment resolved my problem

@minimaxir @gldstrrbt so I tried being a total idiot and read the error message which told me how to fix it. I added if __name__ == '__main__': at the top of my code, indented the rest, and its training right now.

Full code
from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen.utils import GPT2ConfigCPU
from aitextgen import aitextgen

if __name__ == '__main__':
    # The name of the downloaded Shakespeare text for training
    file_name = "shakespeare.txt"

    # Train a custom BPE Tokenizer on the downloaded text
    # This will save two files: aitextgen-vocab.json and aitextgen-merges.txt,
    # which are needed to rebuild the tokenizer.
    train_tokenizer(file_name)
    vocab_file = "aitextgen-vocab.json"
    merges_file = "aitextgen-merges.txt"

    # GPT2ConfigCPU is a mini variant of GPT-2 optimized for CPU-training
    # e.g. the # of input tokens here is 64 vs. 1024 for base GPT-2.
    config = GPT2ConfigCPU()

    # Instantiate aitextgen using the created tokenizer and config
    ai = aitextgen(vocab_file=vocab_file, merges_file=merges_file, config=config)

    # You can build datasets for training by creating TokenDatasets,
    # which automatically processes the dataset with the appropriate size.
    data = TokenDataset(file_name, vocab_file=vocab_file, merges_file=merges_file, block_size=64)

    # Train the model! It will save pytorch_model.bin periodically and after completion.
    # On a 2016 MacBook Pro, this took ~25 minutes to run.
    ai.train(data, batch_size=16, num_steps=5000)

    # Generate text from it!
    ai.generate(10, prompt="ROMEO:")
Execution output
(py38) PS C:\Charles\Projects\Python\GPT2> python .\shakespeare.py
[00:00:00] Reading files (1 Mo)                     ████████████████████████████████████████████████                 100[00:00:00] Tokenize words                           ████████████████████████████████████████████████ 15057    /    15057[00:00:00] Count pairs                              ████████████████████████████████████████████████ 15057    /    15057[00:00:00] Compute merges                           ████████████████████████████████████████████████ 743      /      743
100%|█████████████████████████████████████████████████████████████████████████| 40000/40000 [00:00<00:00, 88102.35it/s]
GPU available: False, used: False
TPU available: None, using: 0 TPU cores
1,000 steps reached: saving model to /trained_model
1,000 steps reached: generating sample texts.
==========
,, you have could you sipe to me
That's no spelling yourself my faceep.

JUTIO:
Sird, my mot, that hence.

Second Cols, my minds, forth, if you,
==========
2,000 steps reached: saving model to /trained_model
2,000 steps reached: generating sample texts.
==========
:
PostNay, I have it in the wisser,
They will beature in the stens of cared?

First Lord's no more, I am more,
For it in justice, and bubyalike in his
==========
3,000 steps reached: saving model to /trained_model
3,000 steps reached: generating sample texts.
==========
;
In this voices that they have we so fly.

Have I this reasons more master,
Not to be stood in the subsignor's guilty
And with a mannter, the surkind and
==========
4,000 steps reached: saving model to /trained_model
4,000 steps reached: generating sample texts.
==========
,
Not with the moones of the chams,
Not, all our general pals of the duke of tears
When painted patrideed their air,
For this unliction, my cenged
==========
5,000 steps reached: saving model to /trained_model
5,000 steps reached: generating sample texts.
==========
:
O, he is good, I have far offerd;
But, as I am a granted to-morrow,
To greateneral, if he doth be my puteous
Before the cheer it iss you, and you
==========
Loss: 3.240 — Avg: 3.249: 100%|████████████████████████████████████████████████████| 5000/5000 [16:53<00:00,  4.93it/s]ROMEO:
They are no less, wherefore I will be wind of him,
And, as I'll entreat you, as you as
But shall for you, my lord,
It would not meet meet you,
At thou wilt know'st me to me
==========
ROMEO:
Dister,
Go, welcome me, a feel sad horse,
To be your grace were king, with a prison,
And talk the king's brother,
On feather I lay on, the duke,
==========
ROMEO:
O, mant of France.

BISHOP OF YORK:
O, that affer of the power of thing.

YORK:
No more; let us go to see his knees
To still my life before,

==========
ROMEO:
Nay, he came to the cheer?

ANGELO:
Neven to the doubt, for I know not.

POLIXENES:
It was a duke:
Say you, sir, I need
==========
ROMEO:
Pray me, and I am formocent in this,
She's nothing?

CAPULET:
Answer it is a pient cloil!

SICINIUS:
Notest, but I could
==========
ROMEO:
Say me, and the worlds, thou hast doubt.

YORK:
Now, I'll speak your coverns, and care,
He's not sink of the correl, at cause
Intent to do the
==========
ROMEO:
He'll in boy, answer, or mistress,
Who is the prince of good fates:
He shall not hears
As soleep the prince you outward, he had been
Should not that ernard, I c
==========
ROMEO:
Deciancts; for I'll hear you,
To see the schooler: O, to the sound,
Suspaf, move a fool, that I please
To turn her power, and my hus
==========
ROMEO:
It be in heavens, my fain,
I do my glory, by the Nedge of this;
Not a guesseral less of his master!

SICINIUS:
Where is the bright mort is the
==========
ROMEO:
I am a less, sir, I will private him as signifis
You have the bixty dower.

CUpon, sir, come, my forth me as he did believe
Anon.

Loss: 3.240 — Avg: 3.249: 100%|████████████████████████████████████████████████████| 5000/5000 [16:54<00:00,  4.93it/s]
(py38) PS C:\Charles\Projects\Python\GPT2>

charlesjlee avatar Jan 24 '21 20:01 charlesjlee