firecrawl
firecrawl copied to clipboard
[Feat] Be able to pass a timeout param to the endpoints
enable the user to pass a "timeout parameter" to both the scrape and the crawl endpoint. If the timeout is exceeded, please send the user a clear error message. On the crawl endpoint, return any pages that have already been scraped but include messages notifying them that the timeout was exceeded.
If the task is completed within two days, we'll include a $10 dollar tip :)
This is an intro bounty. We are looking for exciting people that will buy in so we can start to ramp up.
@nickscamara Can I get assigned?
@ezhil56x all yours!
@nickscamara Do we need a default timeout or not required?
Hi, Is this issue still open, or is someone working on it?
@parthusun8, the issue is still open, but fixing it would need us to make some real complex changes to our bull queue system to allow the /crawl
route to timeout. So far, we’ve found that stopping an active job in bull isn’t possible. This means we’d have to change the deepest parts of our system to add a timeout feature to Firecrawl.
@nickscamara should we close this for now?
Je peux être affecté in the work
/attempt #59 My implementation plan 👍
In the scrape endpoint, we use the scrapeUrl function and pass the timeout value as an option. If the scrape operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout).
In the crawl endpoint, we use the crawlUrl function and pass the timeout value as an option. If the crawl operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout). We also add a message to each page in the response indicating that the crawl timed out.
@akay41024: Another person is already attempting this issue. Please don't start working on this issue unless you were explicitly asked to do so.