maximum recursion depth exceeded while calling a Python object
when train from scratch from some text ( text file of Shakespeare plays,following the tutorial https://docs.aitextgen.io/tutorials/hello-world/ ) will always produce some maximum recursion exceeded error .
I change the code to use GPU as follows , to replicate the error quicker , using save_every=100,generate_every=100 para in train() training looks normal , the error seems come from generate function ;
from aitextgen.TokenDataset import TokenDataset from aitextgen.tokenizers import train_tokenizer from aitextgen.utils import GPT2Config from aitextgen import aitextgen
The name of the downloaded Shakespeare text for training
file_name = "/home/harry/workspace/cryptoAI/gpt2/data/input.txt"
Train a custom BPE Tokenizer on the downloaded text
This will save two files: aitextgen-vocab.json and aitextgen-merges.txt,
which are needed to rebuild the tokenizer.
train_tokenizer(file_name) vocab_file = "/home/harry/workspace/input-vocab.json" merges_file = "/home/harry/workspace/input-merges.txt"
GPT2ConfigCPU is a mini variant of GPT-2 optimized for CPU-training
e.g. the # of input tokens here is 64 vs. 1024 for base GPT-2.
config = GPT2Config()
Instantiate aitextgen using the created tokenizer and config
ai = aitextgen(vocab_file=vocab_file, merges_file=merges_file, config=config)
You can build datasets for training by creating TokenDatasets,
which automatically processes the dataset with the appropriate size.
data = TokenDataset(file_name, vocab_file=vocab_file, merges_file=merges_file, block_size=64)
Train the model! It will save pytorch_model.bin periodically and after completion.
On a 2016 MacBook Pro, this took ~25 minutes to run.
ai.train(data, batch_size=16, num_steps=5000,save_every=100,generate_every=100)
Generate text from it!
ai.generate(10, prompt="ROMEO:")
then after 100 steps , error comes:
File "/home/harry/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1085, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/harry/.local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 226, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/harry/.local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 236, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/harry/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1085, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/harry/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 969, in unk_token return str(self._unk_token) RecursionError: maximum recursion depth exceeded while calling a Python object
I use ubuntu 20.04, python 3.8.5 ,
The error occurs on CPU as well right?
That error message is weird / likely not aitextgen's fault. Does your system have an unusual recursion limit?
You can verify by running:
import sys
print(sys.getrecursionlimit())
It should be 1000.
HI: thanks for the reply
- actually I can't get it run on CPU model :
param_applied = fn(param) File "/home/harry/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 471, inreturn self._apply(lambda t: t.cpu()) RuntimeError: CUDA error: device-side assert triggered and ConnectionResetError: [Errno 104] Connection reset by peer it's ubuntu 20.04, AMD 3400G CPU that's why I use GPU (RTX 3060Ti) -
import sys
print(sys.getrecursionlimit()) 1000 looks like the recursion limit is normal; I tried higher limit like 104,the same , 105 will throw some other error; sys.setrecursionlimit(10**4)
Encounter with same issue too. Have you find the solution?
I receive the same error trying to load this gpt-2 spanish model from hugging face.
ai = aitextgen(model_folder = "trained_model", config = "trained_model/config.json", tokenizer_file="trained_model/tokenizer.json", to_gpu=True)
generated_text = ai.generate_one(max_length=30, prompt="Esto es un ")
print(generated_text)
Stacktrace
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
[<ipython-input-6-2718a02abe3c>](https://localhost:8080/#) in <module>()
6 '''
7
----> 8 generated_text = ai.generate_one(max_length=30, prompt="Esto es un ")
9 print(generated_text)
6 frames
... last 3 frames repeated, from the frame below ...
[/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py](https://localhost:8080/#) in convert_tokens_to_ids(self, tokens)
247
248 if isinstance(tokens, str):
--> 249 return self._convert_token_to_id_with_added_voc(tokens)
250
251 ids = []
RecursionError: maximum recursion depth exceeded while calling a Python object