crawlee-python Enhance `PlaywrightCrawler` testing with mocked Playwright API

Enhance `PlaywrightCrawler` testing with mocked Playwright API

Open vdusek opened this issue 8 months ago • 0 comments

Description

Enhance the testing of PlaywrightCrawler by adding a mocked Playwright API.
It will provide more isolated & stable testing environment, similar to how we use HTTPX and RESPX for BeautifulSoupCrawler - test_beautifulsoup_crawler.py.
File: test_playwright_crawler.py
Relevant documentation: Mock APIs.

Possible solution

Create fixtures for setting up Playwright and a mocked server that intercepts and provides predefined responses for network requests. The BrowserContext can then be used in the PlaywrightCrawler.

@pytest.fixture()
async def playwright() -> AsyncGenerator[Playwright, None]:
    async with async_playwright() as playwright:
        yield playwright


@pytest.fixture()
async def mock_server(playwright: Playwright) -> AsyncGenerator[BrowserContext, None]:
    browser = await playwright.chromium.launch()
    context = await browser.new_context()

    # Intercept requests and provide mock responses
    async def handle_route(route: Route, request: Request) -> None:
        if request.url.endswith('/'):
            response = Response(
                status=200,
                content_type='text/html',
                body="""<html>
                    <head>
                        <title>Hello</title>
                    </head>
                    <body>
                        <a href="/asdf">Link 1</a>
                        <a href="/hjkl">Link 2</a>
                    </body>
                </html>""",
            )
        elif request.url.endswith('/asdf'):
            response = Response(
                status=200,
                content_type='text/html',
                body="""<html>
                    <head>
                        <title>Hello</title>
                    </head>
                    <body>
                        <a href="/uiop">Link 3</a>
                        <a href="/qwer">Link 4</a>
                    </body>
                </html>""",
            )
        else:
            response = Response(
                status=200,
                content_type='text/html',
                body="""<html>
                    <head>
                        <title>Hello</title>
                    </head>
                    <body>
                        Insightful content
                    </body>
                </html>""",
            )
        await route.fulfill(response)

    await context.route('**/*', handle_route)
    yield context
    await browser.close()

The BrowserContext provided by the mock_server fixture should be used in PlaywrightCrawler, possibly via BrowserPool or BrowserPlugin.

Jun 19 '24 14:06 vdusek

crawlee-python crawlee-python copied to clipboard

Enhance `PlaywrightCrawler` testing with mocked Playwright API

Description

Possible solution

crawlee-python
crawlee-python copied to clipboard