pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

yaml.load RecursionError: maximum recursion depth exceeded

Open netquik opened this issue 3 years ago • 9 comments

I'm experiencing some crashes using

with open("DATA/list.yml", "r", encoding="utf8") as list:
yamllist = yaml.load(list, Loader=yaml.Loader)

This load from a file is executed many times during code run without problems. the file is updated during execution and then reloaded the file does not grow in size during updates (just content changes)

After many loads i get:

File "Myapp.py", line 239, in loadyaml
 File "yaml/__init__.py", line 81, in load
 File "yaml/constructor.py", line 49, in get_single_data
 File "yaml/composer.py", line 36, in get_single_node
 File "yaml/composer.py", line 55, in compose_document
 File "yaml/composer.py", line 84, in compose_node
 File "yaml/composer.py", line 133, in compose_mapping_node
 File "yaml/composer.py", line 84, in compose_node
 File "yaml/composer.py", line 133, in compose_mapping_node
 File "yaml/composer.py", line 84, in compose_node
 File "yaml/composer.py", line 133, in compose_mapping_node
 File "yaml/composer.py", line 84, in compose_node
 File "yaml/composer.py", line 133, in compose_mapping_node
 File "yaml/composer.py", line 84, in compose_node
 File "yaml/composer.py", line 133, in compose_mapping_node
 File "yaml/composer.py", line 64, in compose_node
 File "yaml/parser.py", line 98, in check_event
 File "yaml/parser.py", line 449, in parse_block_mapping_value
 File "yaml/scanner.py", line 116, in check_token
 File "yaml/scanner.py", line 255, in fetch_more_tokens
 File "yaml/scanner.py", line 679, in fetch_plain
 File "yaml/scanner.py", line 1305, in scan_plain
 File "yaml/scanner.py", line 1323, in scan_plain_spaces
 File "yaml/scanner.py", line 1427, in scan_line_break
 File "yaml/reader.py", line 95, in prefix
RecursionError: maximum recursion depth exceeded while calling a Python object

Any ideas on how can be fixed?

netquik avatar Dec 07 '21 18:12 netquik

Not without a lot more information... If we assume it's just something about some intermediate version of the document (and nothing about running the parser multiple times, which is a fairly safe assumption), catching it in the act would be easiest by just wrapping the load call in a try block with an except RecursionError: handler, then dump the offending version of the document out and try loading it by itself to see if it fails. If you can get us a document that reproduces the problem, we might be able to fix it.

nitzmahone avatar Dec 07 '21 18:12 nitzmahone

@nitzmahone yes thanks. Well the document after the crash is loaded correctly by the same function (after restarting the app) so i guess there is nothing offending in the document itself. That said i will investigate further but i think i can't reproduce the problem on any document. I think there is something related to running the parser multiple times because the crash happens on different instances of the app and always after around 300 calls to same function.

Sometimes it raises exception on a different point:

  File "yaml/__init__.py", line 81, in load
  File "yaml/constructor.py", line 49, in get_single_data
  File "yaml/composer.py", line 36, in get_single_node
  File "yaml/composer.py", line 55, in compose_document
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 133, in compose_mapping_node
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 133, in compose_mapping_node
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 133, in compose_mapping_node
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 133, in compose_mapping_node
  File "yaml/composer.py", line 84, in compose_node
  File "yaml/composer.py", line 133, in compose_mapping_node
  File "yaml/composer.py", line 64, in compose_node
  File "yaml/parser.py", line 98, in check_event
  File "yaml/parser.py", line 449, in parse_block_mapping_value
  File "yaml/scanner.py", line 116, in check_token
  File "yaml/scanner.py", line 255, in fetch_more_tokens
  File "yaml/scanner.py", line 679, in fetch_plain
  File "yaml/scanner.py", line 1305, in scan_plain
  File "yaml/scanner.py", line 1332, in scan_plain_spaces
  File "yaml/reader.py", line 101, in forward
  File "yaml/reader.py", line 153, in update
  File "yaml/reader.py", line 178, in update_raw 
  RecursionError: maximum recursion depth exceeded

Could this error be caused by resource trlimit ? I see this for now only on a Linux environment (ARM64)

one more detail: the crash comes always from the line of my code that uses Loader=yaml.Loader I use yaml load for other files in the code but I use safe_load for this specific file I can't use safe load because the file includes some !!python/objects the file is around 40K size

I will try to find more information

netquik avatar Dec 07 '21 19:12 netquik

Unlikely it's any kind of external limitation, though it could be a combination of something about the document and a strange case of intra-load leakage that's only triggered by numerous loads. I just did a million loads in a loop of a simple document with no trouble, so you're going to have to give us an actual reproduction if you want to get anywhere.

nitzmahone avatar Dec 07 '21 20:12 nitzmahone

Thanks. The only way i can try reproducing the problem is rewriting python code with same functions deleting all functionality (actually we are talking about a telegram bot). I will update the issue as soon as i try above. Meanwhile i will try to understand if the problem is a Linux platform issue.

netquik avatar Dec 07 '21 21:12 netquik

Hey, just hit the same problem, can be addressed by increasing the Pyhton recursion limit:

import sys
sys.setrecursionlimit(10 ** 6)

Original article here: https://www.geeksforgeeks.org/python-handling-recursion-limit/

mako101 avatar Aug 24 '22 03:08 mako101

@mako101 do you have a document that can actually reproduce the issue faithfully though? Ostensibly there's nothing that should be that deeply recursive for a sane document, which makes me think there's an intra-workload leakage occurring, but we've never been able to reproduce it. Just bumping the recursion limit is likely sweeping whatever the problem is under the rug, unless your document or the types you're deserializing into are .. special :laughing:

nitzmahone avatar Aug 24 '22 23:08 nitzmahone

@nitzmahone sorry I dont. Its basically is a dictionary, with few thousand entries, and and only 1 level nesting. Nothing special there, just hostnames and SNMP trap names and OIDs. It dumps fine from JSON to YAML to file outside of application. the app basically updates the dictionary and dumps it to file every few minutes, until I start seeing the issue. This might sweep the issue under the rug, but the problem has gone away, so good enough to me. At the very least its a functional workaround

mako101 avatar Aug 25 '22 05:08 mako101

@mako101 a couple other questions:

  • are you using Loader, SafeLoader, CLoader, or something else?
  • is the data in the files you're hitting this with just simple untagged basic types, or are you deserializing to custom Python types via tags?
  • are you using anchors/refs in the files you hit this with?

nitzmahone avatar Aug 25 '22 19:08 nitzmahone

I had this issue too. The problem was that I was using it to output some stuff scraped from BeautifulSoup, so some of the apparent strings were actually <class 'bs4.element.NavigableString'> (aint that groovy)

To fix, I just cast it with str and beautifully unpythonic list comprehensions all stuffed on one line

So make sure it's all types that PyYAML likes

darachm avatar Dec 05 '23 16:12 darachm