firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

[Feat] Be able to pass a timeout param to the endpoints

Open nickscamara opened this issue 10 months ago • 9 comments

enable the user to pass a "timeout parameter" to both the scrape and the crawl endpoint. If the timeout is exceeded, please send the user a clear error message. On the crawl endpoint, return any pages that have already been scraped but include messages notifying them that the timeout was exceeded.

If the task is completed within two days, we'll include a $10 dollar tip :)

This is an intro bounty. We are looking for exciting people that will buy in so we can start to ramp up.

nickscamara avatar Apr 24 '24 22:04 nickscamara

@nickscamara Can I get assigned?

ezhil56x avatar Apr 25 '24 06:04 ezhil56x

@ezhil56x all yours!

nickscamara avatar Apr 25 '24 06:04 nickscamara

@nickscamara Do we need a default timeout or not required?

ezhil56x avatar Apr 25 '24 08:04 ezhil56x

Hi, Is this issue still open, or is someone working on it?

parthusun8 avatar Jun 08 '24 20:06 parthusun8

@parthusun8, the issue is still open, but fixing it would need us to make some real complex changes to our bull queue system to allow the /crawl route to timeout. So far, we’ve found that stopping an active job in bull isn’t possible. This means we’d have to change the deepest parts of our system to add a timeout feature to Firecrawl.

rafaelsideguide avatar Jun 10 '24 19:06 rafaelsideguide

@nickscamara should we close this for now?

rafaelsideguide avatar Jul 01 '24 12:07 rafaelsideguide

Je peux être affecté in the work

haija45 avatar Jul 05 '24 16:07 haija45

/attempt #59 My implementation plan 👍

In the scrape endpoint, we use the scrapeUrl function and pass the timeout value as an option. If the scrape operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout).

In the crawl endpoint, we use the crawlUrl function and pass the timeout value as an option. If the crawl operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout). We also add a message to each page in the response indicating that the crawl timed out.

akay41024 avatar Jul 31 '24 21:07 akay41024

@akay41024: Another person is already attempting this issue. Please don't start working on this issue unless you were explicitly asked to do so.

algora-pbc[bot] avatar Jul 31 '24 21:07 algora-pbc[bot]