crawlers icon indicating copy to clipboard operation
crawlers copied to clipboard

Crawling JSON

Open coprisanu opened this issue 5 years ago • 1 comments

Hi,

   We need to crawl a JSON file and to split its content into smaller documents to be indexed in Elasticsearch. We have noticed there are already implementations like CVSSplitter, DOMSplitter or PDFsplitter, is there one for JSON?

Thank you

coprisanu avatar May 10 '19 19:05 coprisanu

No, there are currently none. Good idea though. I will mark as a feature request. In the meantime, if you know your Java, you can implement your own solution by extending AbstractDocumentSplitter (feel free to share).

essiembre avatar May 15 '19 03:05 essiembre