requests-html
requests-html copied to clipboard
fix parse html RecursionError
fix parse html
https://db-engines.com/en/ranking
RecursionError
Reproduce:
Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> p = session.get('https://db-engines.com/en/ranking')
>>> p.html.text
Traceback (most recent call last):
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 33, in fromstring
return _parse(data, beautifulsoup, makeelement, **bsargs)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 79, in _parse
root = _convert_tree(tree, makeelement)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 152, in _convert_tree
res_root = convert_node(html_root)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 216, in convert_node
return handler(bs_node, parent)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
handler(child, res)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
handler(child, res)
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag
handler(child, res)
[Previous line repeated 985 more times]
File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 242, in convert_tag
res = etree.SubElement(parent, bs_node.name, attrib=attribs)
File "src/lxml/etree.pyx", line 3156, in lxml.etree.SubElement
File "src/lxml/apihelpers.pxi", line 199, in lxml.etree._makeSubElement
File "src/lxml/apihelpers.pxi", line 195, in lxml.etree._makeSubElement
File "src/lxml/etree.pyx", line 1630, in lxml.etree._elementFactory
File "src/lxml/classlookup.pxi", line 403, in lxml.etree._parser_class_lookup
File "src/lxml/classlookup.pxi", line 456, in lxml.etree._custom_class_lookup
File "/usr/lib/python3.10/site-packages/lxml/html/__init__.py", line 734, in lookup
if node_type == 'element':
RecursionError: maximum recursion depth exceeded in comparison
>>>
@521xueweihan
I'd love to see a test for this and perhaps the proposed fix could be slightly refactored since we could do
try:
...
except (Exception1, Exception2):
pass
I reckon it's being a couple of years, I might understand that you are no longer interested nor active in this repo, In a few days I will do it myself, I will reference this PR to try give you some credit.