crawlers
crawlers copied to clipboard
Crawling JSON
Hi,
We need to crawl a JSON file and to split its content into smaller documents to be indexed in Elasticsearch. We have noticed there are already implementations like CVSSplitter, DOMSplitter or PDFsplitter, is there one for JSON?
Thank you
No, there are currently none. Good idea though. I will mark as a feature request. In the meantime, if you know your Java, you can implement your own solution by extending AbstractDocumentSplitter
(feel free to share).