qlora icon indicating copy to clipboard operation
qlora copied to clipboard

RecursionError: maximum recursion depth exceeded while calling a Python object, after the pad_token isssue was fixed

Open phalexo opened this issue 2 years ago • 9 comments

The output below is self-explanatory.

File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1022, in unk_token return str(self._unk_token) RecursionError: maximum recursion depth exceeded while calling a Python object

phalexo avatar May 28 '23 22:05 phalexo

Changing the tokenizer back to LlamaTokenizer from LlamaTokenizerFast, removed the infinite recursion problem, BUT it brought back the "core dump" as I stated in another issue.

So, the new tokenizer is clearly causing the infinite recursion, but it also masks the "core dump."

phalexo avatar May 28 '23 22:05 phalexo

+1

flaviadeutsch avatar May 29 '23 03:05 flaviadeutsch

model:guanaco-33b-merged

flaviadeutsch avatar May 29 '23 04:05 flaviadeutsch

I have the same issue. Do you solve this problem?

jianchaoji avatar May 30 '23 21:05 jianchaoji

I have the same issue. Do you solve this problem?

Well, you can use the other Tokenizer without the "Fast" ending. It should get rid of recursion. In my case, however, I still have a core dump issue. Maybe you won't.

phalexo avatar May 30 '23 21:05 phalexo

Thank you for your suggestion. In my case, after I change it back to "LlamaTokenizer ", it still have the recursion problem.

jianchaoji avatar May 30 '23 21:05 jianchaoji

'qlora.py' is the only file I need to modify right?

jianchaoji avatar May 30 '23 21:05 jianchaoji

Did you pull the updated file? There were some other changes related to pad_token.

On Tue, May 30, 2023, 5:31 PM jianchao ji @.***> wrote:

'qlora.py' is the only file I need to modify right?

— Reply to this email directly, view it on GitHub https://github.com/artidoro/qlora/issues/74#issuecomment-1569130169, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDD3ZNMZANAEGXZ2L3V6HDXIZRKBANCNFSM6AAAAAAYSDGE7A . You are receiving this because you authored the thread.Message ID: @.***>

phalexo avatar May 30 '23 21:05 phalexo

Yes, I do git pull to update the file. But I have the same issue.

jianchaoji avatar May 30 '23 21:05 jianchaoji