playwright-python
playwright-python copied to clipboard
[BUG] asyncio.exceptions.InvalidStateError: invalid state thrown by exit in async context manager
System info
- Playwright Version: [v1.40]
- Operating System: [ macOS 14.2.1]
- Browser: Chromium
- Other info:
Source code
from playwright.async_api import async_playwright
import asyncio
async def doit(url):
print(f"Processing {url}")
try:
async with async_playwright() as p:
browser_type = p.chromium
browser = await browser_type.launch(
headless=True,
)
page = await browser.new_page(
bypass_csp=True,
ignore_https_errors=True,
)
res = await page.goto(url, wait_until="load", timeout=30 * 1000)
await page.wait_for_load_state(state="networkidle")
await browser.close()
except Exception as e:
print(f"Got exception {e}")
raise e
asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
Steps
- Save the code above and run it. I'm using python 3.10.7
Expected
It should complete without error.
Actual
- It throws an InvalidStateError -- if it works, just run it a couple more times. It nearly always fails for me.
Processing https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html
Got exception invalid state
Traceback (most recent call last):
File "/Users/philip/play-dir/playtest.py", line 22, in doit
await page.wait_for_load_state(state="networkidle")
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 9367, in wait_for_load_state
await self._impl_obj.wait_for_load_state(state=state, timeout=timeout)
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_page.py", line 491, in wait_for_load_state
return await self._main_frame.wait_for_load_state(**locals_to_params(locals()))
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 237, in wait_for_load_state
return await self._wait_for_load_state_impl(state, timeout)
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 265, in _wait_for_load_state_impl
await waiter.result()
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/philip/play-dir/playtest.py", line 29, in <module>
asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
File "/Users/philip/.pyenv/versions/3.10.7/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/philip/.pyenv/versions/3.10.7/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/Users/philip/play-dir/playtest.py", line 27, in doit
raise e
File "/Users/philip/play-dir/playtest.py", line 7, in doit
async with async_playwright() as p:
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/async_api/_context_manager.py", line 58, in __aexit__
await self._connection.stop_async()
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 288, in stop_async
self.cleanup()
File "/Users/philip/.pyenv/versions/play-dir/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 299, in cleanup
callback.future.set_exception(self._closed_error)
asyncio.exceptions.InvalidStateError: invalid state
I was able to repro in 1 out of 5 runs. However, I was not able to repro with the following snippet. Not yet sure what's going on.
from playwright.async_api import async_playwright
import asyncio
async def doit(url):
print(f"Processing {url}")
async with async_playwright() as p:
browser_type = p.chromium
browser = await browser_type.launch(
headless=True,
)
try:
page = await browser.new_page(
bypass_csp=True,
ignore_https_errors=True,
)
res = await page.goto(url, wait_until="load", timeout=30 * 1000)
await page.wait_for_load_state(state="networkidle")
except Exception as e:
print(f"Got exception {e}")
raise e
finally:
await browser.close()
asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
It appears that the browser.close()
is the key difference. In @dgozman example, this is executed, whereas in my example it is not executed (as the exception is already thrown). Having said that, if you don't do the close()
then it throws a different exception on other urls: https://cnn.com/
I'm unfortunately not able to reproduce it. I tried to repro running 10 times on macOS with Python 3.10 and Python 3.12.
Closing for now since we can't reproduce it.
I don't think this should be closed. I can reproduce the error. Whenever there is a timeout error it appears that the event loop is closing, resulting in an Invalid state.
In [3]: from playwright.async_api import async_playwright
...: import asyncio
...:
...: async def doit(url):
...: print(f"Processing {url}")
...: try:
...: async with async_playwright() as p:
...:
...: browser_type = p.chromium
...:
...: browser = await browser_type.launch(
...: headless=True,
...: )
...:
...: page = await browser.new_page(
...: bypass_csp=True,
...: ignore_https_errors=True,
...: )
...:
...: res = await page.goto(url, wait_until="load", timeout=30 * 1000)
...:
...: await page.wait_for_load_state(state="networkidle")
...: await browser.close()
...:
...: except Exception as e:
...: print(f"Got exception {e}")
...: raise e
...:
...: asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
Processing https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html
Got exception Timeout 30000ms exceeded.
---------------------------------------------------------------------------
TimeoutError Traceback (most recent call last)
Cell In[3], line 29
26 print(f"Got exception {e}")
27 raise e
---> 29 asyncio.run(doit("https://www.streetinsider.com/Press+Releases/Radius+Recycling+Reports+First+Quarter+Fiscal+2024+Financial+Results/22593061.html"))
File ~/.pyenv/versions/3.10.6/lib/python3.10/asyncio/runners.py:44, in run(main, debug)
42 if debug is not None:
43 loop.set_debug(debug)
---> 44 return loop.run_until_complete(main)
45 finally:
46 try:
File ~/.pyenv/versions/3.10.6/lib/python3.10/asyncio/base_events.py:646, in BaseEventLoop.run_until_complete(self, future)
643 if not future.done():
644 raise RuntimeError('Event loop stopped before Future completed.')
--> 646 return future.result()
Cell In[3], line 27, in doit(url)
25 except Exception as e:
26 print(f"Got exception {e}")
---> 27 raise e
Cell In[3], line 20, in doit(url)
11 browser = await browser_type.launch(
12 headless=True,
13 )
15 page = await browser.new_page(
16 bypass_csp=True,
17 ignore_https_errors=True,
18 )
---> 20 res = await page.goto(url, wait_until="load", timeout=30 * 1000)
22 await page.wait_for_load_state(state="networkidle")
23 await browser.close()
File ~/Desktop/open-source/playwright-python/playwright/async_api/_generated.py:8612, in Page.goto(self, url, timeout, wait_until, referer)
8551 async def goto(
8552 self,
8553 url: str,
(...)
8559 referer: typing.Optional[str] = None
8560 ) -> typing.Optional["Response"]:
8561 """Page.goto
8562
8563 Returns the main resource response. In case of multiple redirects, the navigation will resolve with the first
(...)
8608 Union[Response, None]
8609 """
8611 return mapping.from_impl_nullable(
-> 8612 await self._impl_obj.goto(
8613 url=url, timeout=timeout, waitUntil=wait_until, referer=referer
8614 )
8615 )
File ~/Desktop/open-source/playwright-python/playwright/_impl/_page.py:500, in Page.goto(self, url, timeout, waitUntil, referer)
493 async def goto(
494 self,
495 url: str,
(...)
498 referer: str = None,
499 ) -> Optional[Response]:
--> 500 return await self._main_frame.goto(**locals_to_params(locals()))
File ~/Desktop/open-source/playwright-python/playwright/_impl/_frame.py:145, in Frame.goto(self, url, timeout, waitUntil, referer)
135 async def goto(
136 self,
137 url: str,
(...)
140 referer: str = None,
141 ) -> Optional[Response]:
142 return cast(
143 Optional[Response],
144 from_nullable_channel(
--> 145 await self._channel.send("goto", locals_to_params(locals()))
146 ),
147 )
File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:59, in Channel.send(self, method, params)
58 async def send(self, method: str, params: Dict = None) -> Any:
---> 59 return await self._connection.wrap_api_call(
60 lambda: self.inner_send(method, params, False)
61 )
File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:509, in Connection.wrap_api_call(self, cb, is_internal)
507 self._api_zone.set(_extract_stack_trace_information_from_stack(st, is_internal))
508 try:
--> 509 return await cb()
510 finally:
511 self._api_zone.set(None)
File ~/Desktop/open-source/playwright-python/playwright/_impl/_connection.py:97, in Channel.inner_send(self, method, params, return_as_dict)
95 if not callback.future.done():
96 callback.future.cancel()
---> 97 result = next(iter(done)).result()
98 # Protocol now has named return values, assume result is one level deeper unless
99 # there is explicit ambiguity.
100 if not result:
TimeoutError: Timeout 30000ms exceeded.
I am facing a similar problem with my scraper as well. The entire code base is really large so I can't post it here. The scraper is supposed to scrape about 1400+ pages, and each page has a timeout of about 10 seconds. The process should take about 12+ hours without any errors.
Where this error happens isn't exactly consistent, but it seems to occur somewhere after about 3 hours of scraping, at around 350 links. It only throws the error when I stop the python programme, and does not stop the python file automatically like an error.
Some measures taken to workaround:
- created a csv to mark the exact link that was scraped until before the error occured. So that when I scrape again, it will resume from where was left off;
- automatically restart the scraper after 2 hours before it hits the error message.
Edit: Happens on Python 3.10 on MacOS and Python 3.11 on Windows.
Another stacktrace:
.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py:296, in Connection.cleanup(self, cause)
294 ws_connection._transport.dispose()
295 for callback in self._callbacks.values():
--> 296 callback.future.set_exception(self._closed_error)
297 self._callbacks.clear()
298 self.emit("close")
With anyio
:
async with (
async_playwright() as p,
create_task_group() as tg
):
browser = await p.chromium.launch()
list_spider = await SpiderAPI[ListingLink, ListPageLink].create(browser)
tg.start_soon(list_spider.run, spider_list(config)) # curried
await sleep(5)
tg.cancel_scope.cancel()
also facing this issue:
async with async_playwright() as playwright:
browser = await playwright.chromium.launch(headless=True)
await asyncio.gather(
*[
execute_in_task(
settings, producer, session_factory, browser, shutdown_event, i
)
for i in range(settings.max_workers)
],
return_exceptions=True,
)
await browser.close()
Process SpawnProcess-3: Traceback (most recent call last): File "/Users/awtkns/PycharmProjects/deworkd/python/deworker/deworker/main.py", line 142, in loop await browser.close() File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 14581, in close return mapping.from_maybe_impl(await self._impl_obj.close(reason=reason)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_browser.py", line 189, in close raise e File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_browser.py", line 186, in close await self._channel.send("close", {"reason": reason}) File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 63, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 495, in wrap_api_call return await cb() ^^^^^^^^^^ File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 101, in inner_send result = next(iter(done)).result() ^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: Connection closed while reading from the driver
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/homebrew/Cellar/[email protected]/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/homebrew/Cellar/[email protected]/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/awtkns/PycharmProjects/deworkd/python/deworker/deworker/main.py", line 154, in main asyncio.run(loop(settings)) File "/opt/homebrew/Cellar/[email protected]/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/awtkns/PycharmProjects/deworkd/python/deworker/deworker/main.py", line 130, in loop async with semaphore, async_playwright() as playwright: File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/async_api/_context_manager.py", line 58, in aexit await self._connection.stop_async() File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 289, in stop_async self.cleanup() File "/Users/awtkns/Library/Caches/pypoetry/virtualenvs/deworkd-77yazfm4-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 300, in cleanup callback.future.set_exception(self._closed_error) asyncio.exceptions.InvalidStateError: invalid state
I have just randomly encountered a very similar bug just in almost bare asyncio with Python 3.11 without playwright or any other significant library. With that, this may very easily be an asyncio bug itself. Gonna check more and return back as soon as i find more.