crawl4ai
crawl4ai copied to clipboard
Enable PDF Scraping and Return Both PDF and MD Versions
trafficstars
It would be great if crawl4ai could scrape PDF files from websites and return both the PDF and a Markdown (MD) version of the content. Similar to this link https://arxiv.org/pdf/2402.06196
Detect and download PDF files. Convert PDF content into MD format. Return both the PDF and MD files.
and llm extraction strategy
@jmontoyavallejo Thx for the suggestion, crawling PDF, and media files (video, audio) in the backlog, hopefully soon.