requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Basic doc example no longer works

Open dannykellett opened this issue 1 year ago • 4 comments

As doc here: https://requests-html.kennethreitz.org/

from requests_html import HTMLSession
def main() -> None:
    session = HTMLSession()
    r = session.get('https://python.org/')
    print(f"all links = {r.html.absolute_links}")

if __name__ == '__main__':
    main()

Traceback (most recent call last): File "E:\11-Projects\learning_requests_html.py", line 1, in from requests_html import HTMLSession File "E:\11-Projects.venv\Lib\site-packages\requests_html.py", line 14, in from lxml.html.clean import Cleaner File "E:\11-Projects.venv\Lib\site-packages\lxml\html\clean.py", line 18, in raise ImportError( ImportError: lxml.html.clean module is now a separate project lxml_html_clean. Install lxml[html_clean] or lxml_html_clean directly.

I guess I should mention that it worked after installing lxml but thought I should say the docs are not correct.

dannykellett avatar Apr 30 '24 14:04 dannykellett

Ran into the same issue. Hopefully, they update their documentation shortly.

jordanralba avatar May 27 '24 13:05 jordanralba

How do i get it to work? I installed lxml_html_clean but r.html.render() still returns None because r is a Response object that doesnt have an html property

e-ave avatar Sep 24 '24 21:09 e-ave

Okay, I figured it out. But only if you downgrade to version 0.9.0. I still couldnt figure out 0.10.0 because everything returns requests objects instead of requests_html objects.

The readme says to do

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://python.org/')
rendered_html = r.html.render()

but session.get returns a requests.models.Response from the normal requests library, which doesn't have an html attribute. You actually need to call session.request instead of session.get. This function returns a requests_html.HTMLResponse, which is what we need.

from requests_html import HTMLSession
session = HTMLSession()
r = session.request(url='https://python.org/',method="GET")
rendered_html = r.html.render()

e-ave avatar Sep 24 '24 21:09 e-ave