gpt-crawler
gpt-crawler copied to clipboard
Request to support PDF scraping
Hi, Thank you for this amazing repo. I am trying to use this on a website which also has 100s of pdfs. The crawler is unable to get the content from the PDFs. It fails with the error:
PlaywrightCrawler: Request failed and reached maximum retries. page.goto: net::ERR_ABORTED
It will be great if request for crawling through PDFs can be added as well.
How to skip files that come across from parsing?
How to skip files that come across from parsing?
You must specify which extensions you want to exclude in the config.ts file.
resourceExclusions: []