algolia-webcrawler icon indicating copy to clipboard operation
algolia-webcrawler copied to clipboard

PDF-Crawling

Open kernpunkt-thermann opened this issue 7 years ago • 3 comments

Hi,

any ideas/plans about crawling Documents, especially PDFs?

Regards from germany

kernpunkt-thermann avatar Apr 11 '17 11:04 kernpunkt-thermann

Hi!

That would require a lot of work, and it's not planned right now.

I would welcome any PR that tries to add this feature.

nitriques avatar Apr 12 '17 15:04 nitriques

If anyone is trying to do this it looks like this tool might be helpful: https://www.npmjs.com/package/pdf-text-extract

RayBB avatar Mar 15 '18 00:03 RayBB

The problem with that approach is that we cannot use css selector to find the content to index. But it is a start! Thanks for sharing.

nitriques avatar Mar 15 '18 20:03 nitriques