requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Issue Rendering Javascript in a Thread

Open skamensky opened this issue 7 years ago • 29 comments

I'm having an issue calling the render function within a thread. It works perfectly for me outside of a thread but within a thread I get an error.

If this is truly a bug it should be reproducible using this snippet:

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from threading import Thread
>>> from requests_html import HTMLSession
>>> def render_html():
...     session = HTMLSession()
...     r = session.get('http://python-requests.org/')
...     r.html.render()
...
>>> t = Thread(target=render_html)
>>> t.start()
>>> Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<stdin>", line 4, in render_html
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\site-packages\requests_html.py", line 572, in render
    self.session.browser  # Automatycally create a event loop and browser
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\site-packages\requests_html.py", line 679, in browser
    self.loop = asyncio.get_event_loop()
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\asyncio\events.py", line 694, in get_event_loop
    return get_event_loop_policy().get_event_loop()
  File "C:\Users\_REMOVED_\AppData\Local\Programs\Python\Python36\lib\asyncio\events.py", line 602, in get_event_loop
    % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'Thread-1'.

skamensky avatar Mar 31 '18 06:03 skamensky

When asyncio.get_event_loop() is called inside a thread which is not the main it raises this error. Do you need sessions to be unique per thread? If not just do this:

>>> from threading import Thread
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> session.browser
>>> def render_html():
...     r = session.get('http://python-requests.org/')
...     r.html.render()
...
>>> t = Thread(target=render_html)
>>> t.start()

Otherwise, let me know and fix could be done to allow what you want.

oldani avatar Mar 31 '18 21:03 oldani

That gets rid of the error and works as far as I can tell.

How does the package know which browser tab to parse when other threads are accessing the same session instance? Am I at risk of the the wrong virtual tabs/windows being parsed since by their nature threads could be switching virtual tabs at the same time? I had this issue when I was using a single instance of a virtual chrome browser using the selenium package.

Thanks for the tip!

skamensky avatar Mar 31 '18 22:03 skamensky

Each time you call t.html.render it creates a new browser page "tab", do everything you want (e.g: evaluate js) and close that tab "unless you want to interact with the page, then you pass keep_page=True to render. That behavior should keep each thread without interfering with another thread tab.

One suggestion is to keep the number of simultaneous threads low since each page represents a process in chrome and it will consume lots resources going hight.

oldani avatar Mar 31 '18 22:03 oldani

I understand. So now my only question is: can we expect t.html.render to function properly if two separate threads open two tabs simultaneously and attempt to render the page in the virtual browser at the same time?

The reason I ask is because in selenium, you can only inject/execute javascript into a "tab" if the tab is active (i.e. selected) which means threads cannot inject/execute javascript into two tabs at the same moment.

skamensky avatar Mar 31 '18 23:03 skamensky

I encountered with the same problem of RuntimeError: There is no current event loop in thread 'Thread-1'. Tried the snipet of @oldani in cmd and its not working for me. image

Using the latest python(3.6.5) and latest requests_html(0.9.0).

eladbitton avatar Apr 02 '18 07:04 eladbitton

@eladbitton you forgot to run session.browser, look closely at the code above.

However @skamensky I realize another issue that won't allow what you want to achieve related to the event loop, basically to allow this a new event loop needs to be created by each thread, this is what I was thinking for a fix even though this won't allow you to run too many threads before running out of resources (a fix like this will run a chromium process by thread). I will suggest you wait for #146 to be merged and do this asynchronous instead of with threads.

I'm thinking to make this possible and add a warning for not doing this unless you are willing to sacrifice resources.

oldani avatar Apr 02 '18 14:04 oldani

I also have encountered the same problem - There is no current event loop in thread 'Thread-4'. except mine is in Django app class. I can't render() function always raises an error. I've tried running render(keep_page=True) and session.browser with no success.

I'm running Django 2.0.3, Python 3.6.3, requests_html 0.9 and PyCharm Pro 2018.1. I'm using PyCharm's default virtual enviroment for Django.

screenshot_5 screenshot_6 screenshot_7

Commito avatar Apr 02 '18 17:04 Commito

I will add a fix for this

oldani avatar Apr 12 '18 14:04 oldani

I have the same error, but it only happened when I'm using it inside of Django. if I run it locally will work. Do you have any ideas why?

cfournies avatar Feb 03 '19 22:02 cfournies

Hi guys,

Yesterday we released v0.10.0 which now have full support for AsyncHTMLSession you can use session instead of the normal one and won't have this kind of issue.

The issue around Django I have to investigate it yet, can any of you give me more context on it @cfournies @Commito ?

oldani avatar Feb 18 '19 14:02 oldani

I got a similar error when starting multiple threads can you help? By the way you are doing great work @oldani class Loader: def init(self, user_agent=UserAgent, proxies=None, retries=RETRIES, rest=REST, opener=None, cache=None, headers=None, fast=False): self.user_agent = user_agent self.proxies = proxies self.retries = retries self.opener = opener self.cache = cache self.headers = headers self.session = Session() self.empty = set() self.queue = dict() self.base = None self.htmlsession = HTMLSession() self.htmlsession.browser

def ajaxload(self, url):
    r = self.htmlsession.get(url)
    r.html.render()
    pac = dict()
    pac['html'] = r.text
    pac['code'] = r.status_code
    print(r.url)

    return pac

errormultithread

Xyhlon avatar Feb 26 '19 14:02 Xyhlon

Hi @oldani I can help you with django error, let me know what you need. The code doesn't work when is use within django framework.

cfournies avatar Mar 09 '19 16:03 cfournies

I think to know the key to the error here. The thing is the policy of the event loop, for this, we're going to have to create a new event loop per thread in this cases.

oldani avatar Mar 14 '19 02:03 oldani

Hello @oldani

I have the same error using Flask, I've got RuntimeError: There is no current event loop in thread 'Thread-2'. Happens when I use HTMLSession and then call session.browser inside a route or when I try to use AsyncHTMLSession , both raise the error.

I'm not using threads or asyncio in my project, It's a simple Flask app with one route. Tell me if you want me to provides more logs/output/screens.

sayoun avatar Apr 25 '19 07:04 sayoun

Hello @cfournies You can run r.htm.render() in django ?. I try so many way, but it's still exception There is no current event loop in thread

ShamanNguyen avatar Apr 25 '19 07:04 ShamanNguyen

I have the same issue in my Flask application.

jasonniebauer avatar May 20 '19 01:05 jasonniebauer

I have the same issue on Flask

CarreyC avatar Aug 01 '19 02:08 CarreyC

@CarreyC It's ok when run by command.

ShamanNguyen avatar Aug 01 '19 06:08 ShamanNguyen

I have the same issue now i use it in django ,when i add loop in django, i occured error no singal in main thread

434718954 avatar Aug 27 '19 08:08 434718954

Each time you call t.html.render it creates a new browser page "tab", do everything you want (e.g: evaluate js) and close that tab "unless you want to interact with the page, then you pass keep_page=True to render. That behavior should keep each thread without interfering with another thread tab.

One suggestion is to keep the number of simultaneous threads low since each page represents a process in chrome and it will consume lots resources going hight.

Can you please suggest how can i use this in django framework ?

NAveeN4416 avatar Jan 04 '20 11:01 NAveeN4416

I found this on stackoverflow.

Here is my workaround with Flask.

from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
   
async def get_post() {
    new_loop=asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    session = AsyncHTMLSession()
    browser = await pyppeteer.launch({ 
        'ignoreHTTPSErrors':True, 
        'headless':True, 
        'handleSIGINT':False, 
        'handleSIGTERM':False, 
        'handleSIGHUP':False
    })
    session._browser = browser
    resp_page = await session.get(your_query_url)
    await resp_page.html.arender()
    return resp_page
}

tingwei628 avatar Mar 16 '20 13:03 tingwei628

was there a fix with this issue?

gelodefaultbrain avatar Sep 01 '21 12:09 gelodefaultbrain

Hello, just wondering... was this issue fixed ? so shall I just re-install the package?

gelodefaultbrain avatar Nov 08 '21 13:11 gelodefaultbrain

I found this on stackoverflow.

Here is my workaround with Flask.

from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
   
async def get_post() {
    new_loop=asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    session = AsyncHTMLSession()
    browser = await pyppeteer.launch({ 
        'ignoreHTTPSErrors':True, 
        'headless':True, 
        'handleSIGINT':False, 
        'handleSIGTERM':False, 
        'handleSIGHUP':False
    })
    session._browser = browser
    resp_page = await session.get(your_query_url)
    await resp_page.html.arender()
    return resp_page
}

@Têng Ûi may I know the full code on how you call this function? i still cannot make it work

abdullzz avatar Apr 24 '22 10:04 abdullzz

its giving me this error RuntimeError: Event loop is closed sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited

abdullzz avatar Apr 24 '22 12:04 abdullzz

I found this on stackoverflow.

Here is my workaround with Flask.

from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
   
async def get_post() {
    new_loop=asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    session = AsyncHTMLSession()
    browser = await pyppeteer.launch({ 
        'ignoreHTTPSErrors':True, 
        'headless':True, 
        'handleSIGINT':False, 
        'handleSIGTERM':False, 
        'handleSIGHUP':False
    })
    session._browser = browser
    resp_page = await session.get(your_query_url)
    await resp_page.html.arender()
    return resp_page
}

its giving me this error RuntimeError: Event loop is closed sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited

This is returning me with a coroutine object instead of html object. Did you possibly have that?

MrDarkness117 avatar May 09 '23 00:05 MrDarkness117

I found this on stackoverflow. Here is my workaround with Flask.

from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
   
async def get_post() {
    new_loop=asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    session = AsyncHTMLSession()
    browser = await pyppeteer.launch({ 
        'ignoreHTTPSErrors':True, 
        'headless':True, 
        'handleSIGINT':False, 
        'handleSIGTERM':False, 
        'handleSIGHUP':False
    })
    session._browser = browser
    resp_page = await session.get(your_query_url)
    await resp_page.html.arender()
    return resp_page
}

@têng Ûi may I know the full code on how you call this function? i still cannot make it work

UPD: You probably need to do asyncio.run() on that function so you get the result. See if you haven't done that.

MrDarkness117 avatar May 17 '23 20:05 MrDarkness117

await resp_page.html.arender() never returns...

migonsa avatar Nov 25 '23 04:11 migonsa

I found this on stackoverflow.

Here is my workaround with Flask.

from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
   
async def get_post() {
    new_loop=asyncio.new_event_loop()
    asyncio.set_event_loop(new_loop)
    session = AsyncHTMLSession()
    browser = await pyppeteer.launch({ 
        'ignoreHTTPSErrors':True, 
        'headless':True, 
        'handleSIGINT':False, 
        'handleSIGTERM':False, 
        'handleSIGHUP':False
    })
    session._browser = browser
    resp_page = await session.get(your_query_url)
    await resp_page.html.arender()
    return resp_page
}

This worked for me! Make sure to run asyncio.run(get_post) to get the result instead of coroutine

Tamupiwa avatar Feb 09 '24 23:02 Tamupiwa