firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

[Bug] timeout parameter not passed to playwright service

Open mschfh opened this issue 4 months ago • 1 comments

Describe the Bug

The timeout parameter is not passed to playwright-service.

To Reproduce Steps to reproduce the issue:

  1. Configure the docker-compose setup with the python-based microservice:
services:
  playwright-service:
    build: apps/playwright-service

.env:

PLAYWRIGHT_MICROSERVICE_URL: http://playwright-service:3000/html
  1. Send an API request:
{
	"url": "https://[removed]",
	"timeout": 60000,
	"waitFor": 30000,
	"formats": [
		"markdown"
	]
}
  1. Observe that the request sent to the microservice omits the timeout:
POST /html HTTP/1.1
Host: playwright-service:3000
[..]

{"url":"[removed]","wait_after_load":30000}

The log displays an error with the default timeout of 15000ms:

playwright-service-1  | playwright._impl._errors.TimeoutError: Page.goto: Timeout 15000ms exceeded.
playwright-service-1  | Call log:
playwright-service-1  | navigating to "https://[removed]", waiting until "load"

Expected Behavior The timeout is passed to the playwright-service and used for Page.goto

Additional Context The service expects a timeout parameter in the body: https://github.com/mendableai/firecrawl/blob/a40fb3b062dfee4d1dd79c4c4946f2f418da32c7/apps/playwright-service/main.py#L91-L95

The playwright integration is not passing the parameter: https://github.com/mendableai/firecrawl/blob/a40fb3b062dfee4d1dd79c4c4946f2f418da32c7/apps/api/src/scraper/WebScraper/scrapers/playwright.ts#L38-L44

The suggested fix would be passing that parameter in the integration.

mschfh avatar Oct 04 '24 04:10 mschfh