firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

Feat: return HTML / CSS from scrape and crawl endpoints

Open calebpeffer opened this issue 9 months ago • 4 comments

For some usecases, it would be valuable to return the raw HTML and CSS without any pre-processing. Discussion on execution below @rafaelsideguide @nickscamara 👇

calebpeffer avatar May 03 '24 19:05 calebpeffer

Should it be returned by default? Or should users preference if they want the html? The problem is that some pages are way too long, making the request response "too long" in some cases. We could potentially have some sort of param on the API that users can specify if they want HTML/CSS returned or not.

nickscamara avatar May 03 '24 21:05 nickscamara

@rafaelsideguide what are your thoughts on this?

nickscamara avatar May 03 '24 21:05 nickscamara

I think adding an API parameter is the best way to go. This way, Firecrawl won’t get slower for everyone, only for the users who need CSS and HTML in their requests.

rafaelsideguide avatar May 06 '24 12:05 rafaelsideguide

Hm, I think that's good but because there is not added latency in returning the markdown as well, it might be wise to do so. I was more thinking of an optional to return the html but not exclude the mardown return

nickscamara avatar May 06 '24 15:05 nickscamara

Maybe change to includeHtml

nickscamara avatar May 06 '24 19:05 nickscamara