ValueError: Page.title: The future belongs to a different loop than the one specified as the loop argument
example:
import scrapy
class MySpider(scrapy.Spider):
name = "my_spider"
start_urls = ['https://www.basketball-reference.com/leagues/NBA_2022.html']
async def start(self):
for url in self.start_urls:
yield scrapy.Request(url, callback=self.parse, meta={'playwright': True, 'playwright_include_page': True})
async def parse(self, response):
page = response.meta['playwright_page']
# 使用 Playwright 的 PageCoroutine 来确保异步调用在正确的事件循环中
title = await page.title() # 获取页面标题
self.logger.info(f"Page Title: {title}")
# 获取 cookies
cookies = await page.context.cookies() # 获取 cookies
self.logger.info(f"Cookies: {cookies}")
# 继续其他的爬虫逻辑
yield {'title': title, 'cookies': cookies}
Version: scrapy: 2.13.3 scrapy-playwright: 0.0.44 playwright: 1.55.0
I just want to get cookie with playwright.Page, but it doesn't work. It's seen scrapy async conflicted with playwright. Pls help , thx.
I'm sorry, I cannot reproduce with the following software versions:
$ scrapy version -v
Scrapy : 2.13.3
lxml : 6.0.0
libxml2 : 2.14.4
cssselect : 1.3.0
parsel : 1.10.0
w3lib : 2.3.1
Twisted : 25.5.0
Python : 3.12.3 (main, Jun 10 2024, 14:59:09) [GCC 11.4.0]
pyOpenSSL : 25.1.0 (OpenSSL 3.5.2 5 Aug 2025)
cryptography : 45.0.6
Platform : Linux-6.5.0-45-generic-x86_64-with-glibc2.35
$ python -c "import scrapy; print(scrapy.__version__)"
2.13.3
$ playwright --version
Version 1.55.0
Please provide the software versions you are using and additional logs.
Are you maybe using Windows? I usually rely on the Windows CI as I don't have quick access to a Windows system to develop directly on it. The main difference on Windows is the use of a separate threaded loop implementation, and the issue title could point in that direction. However I forced the threaded loop in this example (by setting _PLAYWRIGHT_THREADED_LOOP=True, a private undocumented setting intended only for tests) and the crawl finished successfully.
Logs excerpt:
(...)
2025-09-29 11:16:14 [scrapy.core.engine] INFO: Closing spider (finished)
2025-09-29 11:16:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 250,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 1087980,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 7.210502,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2025, 9, 29, 14, 16, 14, 629719, tzinfo=datetime.timezone.utc),
'item_scraped_count': 1,
'items_per_minute': 8.571428571428571,
'log_count/DEBUG': 794,
'log_count/INFO': 15,
'log_count/WARNING': 1,
'memusage/max': 75010048,
'memusage/startup': 75010048,
'playwright/browser_count': 1,
'playwright/context_count': 1,
'playwright/context_count/max_concurrent': 1,
'playwright/context_count/persistent/False': 1,
'playwright/context_count/remote/False': 1,
'playwright/page_count': 1,
'playwright/page_count/max_concurrent': 1,
'playwright/request_count': 397,
'playwright/request_count/method/GET': 369,
'playwright/request_count/method/HEAD': 1,
'playwright/request_count/method/POST': 27,
'playwright/request_count/navigation': 81,
'playwright/request_count/resource_type/document': 81,
'playwright/request_count/resource_type/fetch': 55,
'playwright/request_count/resource_type/image': 200,
'playwright/request_count/resource_type/other': 4,
'playwright/request_count/resource_type/script': 36,
'playwright/request_count/resource_type/stylesheet': 2,
'playwright/request_count/resource_type/xhr': 19,
'playwright/response_count': 390,
'playwright/response_count/method/GET': 363,
'playwright/response_count/method/HEAD': 1,
'playwright/response_count/method/POST': 26,
'playwright/response_count/resource_type/document': 81,
'playwright/response_count/resource_type/fetch': 54,
'playwright/response_count/resource_type/image': 195,
'playwright/response_count/resource_type/other': 4,
'playwright/response_count/resource_type/script': 35,
'playwright/response_count/resource_type/stylesheet': 2,
'playwright/response_count/resource_type/xhr': 19,
'response_received_count': 1,
'responses_per_minute': 8.571428571428571,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2025, 9, 29, 14, 16, 7, 419217, tzinfo=datetime.timezone.utc)}
2025-09-29 11:16:14 [scrapy.core.engine] INFO: Spider closed (finished)
2025-09-29 11:16:14 [scrapy-playwright] INFO: Closing download handler
2025-09-29 11:16:14 [scrapy-playwright] DEBUG: Browser context closed: 'default' (persistent=False, remote=False)
2025-09-29 11:16:14 [scrapy-playwright] INFO: Closing browser
2025-09-29 11:16:14 [scrapy-playwright] DEBUG: Browser disconnected
I'm sorry, I cannot reproduce with the following software versions:
$ scrapy version -v Scrapy : 2.13.3 lxml : 6.0.0 libxml2 : 2.14.4 cssselect : 1.3.0 parsel : 1.10.0 w3lib : 2.3.1 Twisted : 25.5.0 Python : 3.12.3 (main, Jun 10 2024, 14:59:09) [GCC 11.4.0] pyOpenSSL : 25.1.0 (OpenSSL 3.5.2 5 Aug 2025) cryptography : 45.0.6 Platform : Linux-6.5.0-45-generic-x86_64-with-glibc2.35 $ python -c "import scrapy; print(scrapy.__version__)" 2.13.3 $ playwright --version Version 1.55.0Please provide the software versions you are using and additional logs.
Are you maybe using Windows? I usually rely on the Windows CI as I don't have quick access to a Windows system to develop directly on it. The main difference on Windows is the use of a separate threaded loop implementation, and the issue title could point in that direction. However I forced the threaded loop in this example (by setting
_PLAYWRIGHT_THREADED_LOOP=True, a private undocumented setting intended only for tests) and the crawl finished successfully.Logs excerpt:
(...) 2025-09-29 11:16:14 [scrapy.core.engine] INFO: Closing spider (finished) 2025-09-29 11:16:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 250, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 1087980, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 7.210502, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 9, 29, 14, 16, 14, 629719, tzinfo=datetime.timezone.utc), 'item_scraped_count': 1, 'items_per_minute': 8.571428571428571, 'log_count/DEBUG': 794, 'log_count/INFO': 15, 'log_count/WARNING': 1, 'memusage/max': 75010048, 'memusage/startup': 75010048, 'playwright/browser_count': 1, 'playwright/context_count': 1, 'playwright/context_count/max_concurrent': 1, 'playwright/context_count/persistent/False': 1, 'playwright/context_count/remote/False': 1, 'playwright/page_count': 1, 'playwright/page_count/max_concurrent': 1, 'playwright/request_count': 397, 'playwright/request_count/method/GET': 369, 'playwright/request_count/method/HEAD': 1, 'playwright/request_count/method/POST': 27, 'playwright/request_count/navigation': 81, 'playwright/request_count/resource_type/document': 81, 'playwright/request_count/resource_type/fetch': 55, 'playwright/request_count/resource_type/image': 200, 'playwright/request_count/resource_type/other': 4, 'playwright/request_count/resource_type/script': 36, 'playwright/request_count/resource_type/stylesheet': 2, 'playwright/request_count/resource_type/xhr': 19, 'playwright/response_count': 390, 'playwright/response_count/method/GET': 363, 'playwright/response_count/method/HEAD': 1, 'playwright/response_count/method/POST': 26, 'playwright/response_count/resource_type/document': 81, 'playwright/response_count/resource_type/fetch': 54, 'playwright/response_count/resource_type/image': 195, 'playwright/response_count/resource_type/other': 4, 'playwright/response_count/resource_type/script': 35, 'playwright/response_count/resource_type/stylesheet': 2, 'playwright/response_count/resource_type/xhr': 19, 'response_received_count': 1, 'responses_per_minute': 8.571428571428571, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2025, 9, 29, 14, 16, 7, 419217, tzinfo=datetime.timezone.utc)} 2025-09-29 11:16:14 [scrapy.core.engine] INFO: Spider closed (finished) 2025-09-29 11:16:14 [scrapy-playwright] INFO: Closing download handler 2025-09-29 11:16:14 [scrapy-playwright] DEBUG: Browser context closed: 'default' (persistent=False, remote=False) 2025-09-29 11:16:14 [scrapy-playwright] INFO: Closing browser 2025-09-29 11:16:14 [scrapy-playwright] DEBUG: Browser disconnected
❯ scrapy version -v
Scrapy : 2.13.3
lxml : 6.0.2
libxml2 : 2.11.9
cssselect : 1.3.0
parsel : 1.10.0
w3lib : 2.3.1
Twisted : 25.5.0
Python : 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
pyOpenSSL : 25.3.0 (OpenSSL 3.5.3 16 Sep 2025)
cryptography : 46.0.1
Platform : Windows-10-10.0.26100-SP0
❯ playwright --version
Version 1.55.0
set ‘_PLAYWRIGHT_THREADED_LOOP=True’ doesn't work, and I found scrapy settings TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" doesn't support ‘ProactorEventLoop’, but this plugin create playwright use ProactorEventLoop. So I try to change event loop:
# settings.py
ASYNCIO_EVENT_LOOP = "asyncio.windows_events.SelectorEventLoop
# scrapy-playwright/_utils.py
class _ThreadedLoopAdapter:
......
def start(cls, caller_id: int) -> None:
cls._stop_events[caller_id] = asyncio.Event()
if not getattr(cls, "_loop", None):
policy = asyncio.DefaultEventLoopPolicy()
if platform.system() == "Windows":
# policy = asyncio.WindowsProactorEventLoopPolicy() # type: ignore[attr-defined]
policy = asyncio.WindowsSelectorEventLoopPolicy()
cls._loop = policy.new_event_loop()
But show this error: Does any other event make it work? even less performance. Thanks.
2025-10-19 00:15:17 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method DownloadHandlers._close of <scrapy.core.downloader.handlers.DownloadHandlers object at 0x000001F3B6157010>>
Traceback (most recent call last):
File "D:\Projects\part_time_project\.env\lib\site-packages\twisted\internet\defer.py", line 1853, in _inlineCallbacks
result = context.run(
File "D:\Projects\part_time_project\.env\lib\site-packages\twisted\python\failure.py", line 467, in throwExceptionIntoGenerator
return g.throw(self.value.with_traceback(self.tb))
File "D:\Projects\part_time_project\.env\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 109, in _close
yield dh.close()
File "D:\Projects\part_time_project\.env\lib\site-packages\twisted\internet\defer.py", line 1853, in _inlineCallbacks
result = context.run(
File "D:\Projects\part_time_project\.env\lib\site-packages\twisted\python\failure.py", line 467, in throwExceptionIntoGenerator
return g.throw(self.value.with_traceback(self.tb))
File "D:\Projects\part_time_project\.env\lib\site-packages\scrapy_playwright\handler.py", line 355, in close
yield self._deferred_from_coro(self._close())
File "D:\Projects\part_time_project\.env\lib\site-packages\scrapy_playwright\_utils.py", line 123, in _handle_coro
result = await coro
File "D:\Projects\part_time_project\.env\lib\site-packages\scrapy_playwright\handler.py", line 367, in _close
await self.playwright_context_manager.__aexit__()
File "D:\Projects\part_time_project\.env\lib\site-packages\playwright\async_api\_context_manager.py", line 57, in __aexit__
await self._connection.stop_async()
File "D:\Projects\part_time_project\.env\lib\site-packages\playwright\_impl\_connection.py", line 321, in stop_async
self._transport.request_stop()
File "D:\Projects\part_time_project\.env\lib\site-packages\playwright\_impl\_transport.py", line 97, in request_stop
assert self._output
AttributeError: 'PipeTransport' object has no attribute '_output'
There's no need to change the event loop, this is handled automatically by scrapy-playwright: see the notes about Windows support in the readme: https://github.com/scrapy-plugins/scrapy-playwright/tree/v0.0.44#windows-support
There is also no need to use _PLAYWRIGHT_THREADED_LOOP if you're already on Windows. As mentioned in the docs I linked above, Windows is supported by running the Playwright process in a separate thread. What _PLAYWRIGHT_THREADED_LOOP does is force that approach even if it's not strictly necessary, for testing purposes.
Are you sure you're using scrapy-playwright version 0.0.44? I didn't realize before that this looks exactly like #307, which was solved only in v0.0.44.
$ python -c "import scrapy_playwright; print(scrapy_playwright.__version__)"
0.0.44
got the same issue on windows
(.venv) PS C:\code\Company\autohome_spider> scrapy version -v
Scrapy : 2.13.3
lxml : 6.0.2
libxml2 : 2.11.9
cssselect : 1.3.0
parsel : 1.10.0
w3lib : 2.3.1
Twisted : 25.5.0
Python : 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
pyOpenSSL : 25.3.0 (OpenSSL 3.5.4 30 Sep 2025)
cryptography : 46.0.3
Platform : Windows-10-10.0.19045-SP0
(.venv) PS C:\code\Company\autohome_spider> python -c "import scrapy; print(scrapy.version)" 2.13.3
(.venv) PS C:\code\Company\autohome_spider> playwright --version Version 1.55.0
https://github.com/scrapy-plugins/scrapy-playwright/issues/307 still got the same error may be the problem is realated with operation system
use wsl everything is ok 。Only windows got error。