firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

[Bug] Scraping httpstat.us/200 reliably triggers 500 responses from the Firecrawl API (FIR-519)

Open karolinepauls opened this issue 1 month ago • 3 comments

Describe the Bug Looks like empty scraped page responses cause 500 errors.

To Reproduce Steps to reproduce the issue: https://www.firecrawl.dev/playground?url=http%3A%2F%2Fhttpstat.us%2F200&mode=scrape

markdown:

curl -X POST https://api.firecrawl.dev/v1/scrape \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
    -d '{
        "url": "http://httpstat.us/200",
                "formats": [ "markdown" ]
        }'

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>

rawHtml:

curl -X POST https://api.firecrawl.dev/v1/scrape     -H 'Content-Type: application/json'     -H "Authorization: Bearer $FIRECRAWL_API_KEY"     -d '{
        "url": "http://httpstat.us/200",
                "formats": [ "rawHtml" ]
        }'
{"success":false,"error":"(Internal server error) - All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected]."}

Expected Behavior OK API response containing empty content

Screenshots image

karolinepauls avatar Dec 31 '24 12:12 karolinepauls