spark-solr
spark-solr copied to clipboard
Nested json from solr to spark
We are testing nested json with solr and trying to analyze it in spark with python. We are using data from repository https://github.com/alisatl/solr-revolution-2016-nested-demo/blob/master/data/example-data-solr.json
The json scheme is following:
Code below:
sqlContext.read.format("solr").option("zkhost", config.zkserver).option("collection", config.solr_collection).option('child_doc_fieldname', '_childDocuments_').option("query", 'path:2.posts.comments AND sentiment:negative').option('fields', '*,[child parentFilter=path:"2.*"]').load()
Produces spark dataframe with only one column - field id.
The problem is with fields parameter "child parentFilter", since below examples work properly:
.option('fields', '*')
.option('fields', 'text, author')
We don't support that particular syntax for nested fields right now