crawlee-python
crawlee-python copied to clipboard
Enhance `PlaywrightCrawler` testing with mocked Playwright API
Description
- Enhance the testing of PlaywrightCrawler by adding a mocked Playwright API.
- It will provide more isolated & stable testing environment, similar to how we use HTTPX and RESPX for
BeautifulSoupCrawler
- test_beautifulsoup_crawler.py. - File: test_playwright_crawler.py
- Relevant documentation: Mock APIs.
Possible solution
Create fixtures for setting up Playwright and a mocked server that intercepts and provides predefined responses for network requests. The BrowserContext
can then be used in the PlaywrightCrawler
.
@pytest.fixture()
async def playwright() -> AsyncGenerator[Playwright, None]:
async with async_playwright() as playwright:
yield playwright
@pytest.fixture()
async def mock_server(playwright: Playwright) -> AsyncGenerator[BrowserContext, None]:
browser = await playwright.chromium.launch()
context = await browser.new_context()
# Intercept requests and provide mock responses
async def handle_route(route: Route, request: Request) -> None:
if request.url.endswith('/'):
response = Response(
status=200,
content_type='text/html',
body="""<html>
<head>
<title>Hello</title>
</head>
<body>
<a href="/asdf">Link 1</a>
<a href="/hjkl">Link 2</a>
</body>
</html>""",
)
elif request.url.endswith('/asdf'):
response = Response(
status=200,
content_type='text/html',
body="""<html>
<head>
<title>Hello</title>
</head>
<body>
<a href="/uiop">Link 3</a>
<a href="/qwer">Link 4</a>
</body>
</html>""",
)
else:
response = Response(
status=200,
content_type='text/html',
body="""<html>
<head>
<title>Hello</title>
</head>
<body>
Insightful content
</body>
</html>""",
)
await route.fulfill(response)
await context.route('**/*', handle_route)
yield context
await browser.close()
The BrowserContext
provided by the mock_server
fixture should be used in PlaywrightCrawler
, possibly via BrowserPool
or BrowserPlugin
.