pyppeteer
pyppeteer copied to clipboard
#159 [SEVERE] The communication with Chromium are disconnected after 20 seconds.
#159 [SEVERE] The communication with Chromium are disconnected after 20 seconds.
This is fixed in https://github.com/pyppeteer/pyppeteer2
See https://github.com/miyakogi/pyppeteer/pull/160#issuecomment-616863812
Thank you for this!
When will this be added in? The best solution to this right now is to keep opening and closing a browser for every page interaction and keeping it under 20 s. :-(
perhaps we need to keep the connection alive with ping-pong instead of setting timeouts to none
@nurettin I think the problem here is that Chrome does not send pong back, so when our WebSocket client send ping out but not receiving pong back after 20 seconds timeout, it thinks that the connection is lost so it disconnects.
@PCXDME Perhaps then we need a way to check if _ws
really did disconnect from the page instance when we set the timeout to None
?
@nurettin The way WebSocket client knows whether it really disconnects is through timeout if I remembered correctly. Otherwise you need to use other request/response message instead of ping/pong to be used for timeout mechanism. But anyways, having timeout set to none is better than having timeout without Chrome responding pong. Disconnections also won't happen regularly as Chrome and pyppeteer are usually on the same machine (local). It would happens only when Chrome crashes/exits. When we close browser with pyppeteer, there is nothing to worry about as we are closing it so we do also close WebSocket connectiom. I would suggest to fix one problem at a time. This seems to be more important as you can not use the library for more than 20 seconds.
@PCXDME
But anyways, having timeout set to none is better than having timeout without Chrome responding pong.
I agree
I would suggest to fix one problem at a time. This seems to be more important as you can not use the library for more than 20 seconds.
pretty sure it fixes the issue for now (thank you). I just have a long running process, so I was looking for ways to solidify that service and thought this would be a suitable place to talk about it.
Another solution for this, is to set websockets==6.0, it is good until this pr will be merged and released
I think this PR could not be aproved if setup.py requirements do not change (websockets>=7.0). Parameters ping_interval
and ping_timeout
do not exist in Websockets 6.0.
https://websockets.readthedocs.io/en/6.0/api.html#websockets.client.connect
Another solution for this, is to set websockets==6.0, it is good until this pr will be merged and released
I did it like that in the beginning, but I get disconnects with websockets==6.0 more often than 7.0 for some reason.
It's a chromium bug actually, other Puppeteer implementations suffer from this too, so it seems only workaround is just to disable pings. I've tested 15 minutes delay interval, headless chrome responded after this period.
https://bugs.chromium.org/p/chromium/issues/detail?id=865002
For those who want to hack before the patch arrives.
def patch_pyppeteer():
import pyppeteer.connection
original_method = pyppeteer.connection.websockets.client.connect
def new_method(*args, **kwargs):
kwargs['ping_interval'] = None
kwargs['ping_timeout'] = None
return original_method(*args, **kwargs)
pyppeteer.connection.websockets.client.connect = new_method
patch_pyppeteer()
By the way. Patching approach is a nice one! There's another patch that changes chromium download to validated https instead of unsecure one that is now.
@luabish Do you have write access? Could you also merge this? I could not merge because of the travis checks.
@luabish Do you have write access? Could you also merge this? I could not merge because of the travis checks.
@PCXDME I just met this problem when I occasionally use the module requests_html.I guess I can't merge this.
10 months now and this hasn't been fixed in master. At this point, this library is unmaintained.
10 months now and this hasn't been fixed in master. At this point, this library is unmaintained.
anybody tried to request the lib author?)
def patch_pyppeteer(): import pyppeteer.connection original_method = pyppeteer.connection.websockets.client.connect
def new_method(*args, **kwargs): kwargs['ping_interval'] = None kwargs['ping_timeout'] = None return original_method(*args, **kwargs) pyppeteer.connection.websockets.client.connect = new_method
patch_pyppeteer()
It works. Thanks mate!!!
def patch_pyppeteer(): import pyppeteer.connection original_method = pyppeteer.connection.websockets.client.connect def new_method(*args, **kwargs): kwargs['ping_interval'] = None kwargs['ping_timeout'] = None return original_method(*args, **kwargs) pyppeteer.connection.websockets.client.connect = new_method patch_pyppeteer()
Note that you are patching the websockets.client
module itself here! pypuppeteer.websockets
is just the module global reference to the websockets
package. You may as well just use
def patch_websockets():
import websockets.client
original_method = websockets.client.connect
def new_method(*args, **kwargs):
kwargs['ping_interval'] = None
kwargs['ping_timeout'] = None
return original_method(*args, **kwargs)
websockets.client.connect = new_method
patch_websockets()
Instead of patching an innocent 3rd-party library, I'm patching pyppeteer itself:
def _patch_pyppeteer():
from typing import Any
from pyppeteer import connection, launcher
import websockets.client
class PatchedConnection(connection.Connection): # type: ignore
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
# the _ws argument is not yet connected, can simply be replaced with another
# with better defaults.
self._ws = websockets.client.connect(
self._url,
loop=self._loop,
# the following parameters are all passed to WebSocketCommonProtocol
# which markes all three as Optional, but connect() doesn't, hence the liberal
# use of type: ignore on these lines.
# fixed upstream but not yet released, see aaugustin/websockets#93ad88
max_size=None, # type: ignore
ping_interval=None, # type: ignore
ping_timeout=None, # type: ignore
)
connection.Connection = PatchedConnection
# also imported as a global in pyppeteer.launcher
launcher.Connection = PatchedConnection
_patch_pyppeteer()
Please @miyakogi, could you please take a look? This is breaking for many users and has a simple fix.
def _patch_pyppeteer(): from typing import Any from pyppeteer import connection, launcher import websockets.client class PatchedConnection(connection.Connection): # type: ignore def __init__(self, *args: Any, **kwargs: Any) -> None: super().__init__(*args, **kwargs) # the _ws argument is not yet connected, can simply be replaced with another # with better defaults. self._ws = websockets.client.connect( self._url, loop=self._loop, # the following parameters are all passed to WebSocketCommonProtocol # which markes all three as Optional, but connect() doesn't, hence the liberal # use of type: ignore on these lines. # fixed upstream but not yet released, see aaugustin/websockets#93ad88 max_size=None, # type: ignore ping_interval=None, # type: ignore ping_timeout=None, # type: ignore ) connection.Connection = PatchedConnection # also imported as a global in pyppeteer.launcher launcher.Connection = PatchedConnection _patch_pypuppeteer()
@mjpieters
you have a typo in your patch def _patch_pypuppeteer
@BobCashStory
you have a typo in your patch
def _patch_pypuppeteer
Oopsie, fixed now. Thanks for pointing that out!
This library seems to have been abandoned, however I and others have been working on an updated fork — pyppeteer2. It's up on PyPi and the fix has already been applied.
@PCXDME / @aronsky / @jrmlhermitte Could you maybe include this information in your topmost post so that others don't have to scroll through other workarounds?