firecrawl icon indicating copy to clipboard operation
firecrawl copied to clipboard

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Results 574 firecrawl issues
Sort by recently updated
recently updated
newest added

Replaced the exclude tag list with a function that does nicer and safer clean up. Resolves #1 Added basics tests for the function. _Important_: should add an **integration** test with...

in review

Add the ability to filter related websites by regex, for instance:”https://www.archdaily.com/1015605/bandhan-residential-school-of-business-abin-design-studio“

ccing @rafaelsideguide

enhancement

When scraping, and mostly crawling, provide the ability to have all relative urls changed to absolute urls (for further processing or link extraction). Eg. `[The PDF file][/assets/file.pdf]` => `[The PDF...

enhancement

Consider adding haiku or replacing with haiku for image in [utils/gptVision.ts](https://github.com/mendableai/firecrawl/blob/main/apps/api/src/scraper/WebScraper/utils/gptVision.ts) The same prompt will work well. Also you should probably shift to the now `gpt-4-turbo` which [recommended](https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4) instead of...

The markdown would be much more useful if you stripped headers/footers and other tags like filters etc that is not core content (i.e. low value for RAG/context). Either using tag...

In tweaking and growing the html clean up and html-to-md. I highly recommend adding integration tests using either live webpages (to test also the get/network and dynamic websites) OR at...

wip

These are viktor-invented categories. [_source_](https://github.com/szepeviktor/debian-server-tools/blob/master/.gitignore#L15)

ready to merge