spark-solr icon indicating copy to clipboard operation
spark-solr copied to clipboard

Nested json from solr to spark

Open nelibla opened this issue 5 years ago • 1 comments

We are testing nested json with solr and trying to analyze it in spark with python. We are using data from repository https://github.com/alisatl/solr-revolution-2016-nested-demo/blob/master/data/example-data-solr.json The json scheme is following: scheme

Code below: sqlContext.read.format("solr").option("zkhost", config.zkserver).option("collection", config.solr_collection).option('child_doc_fieldname', '_childDocuments_').option("query", 'path:2.posts.comments AND sentiment:negative').option('fields', '*,[child parentFilter=path:"2.*"]').load()

Produces spark dataframe with only one column - field id.

The problem is with fields parameter "child parentFilter", since below examples work properly: .option('fields', '*') .option('fields', 'text, author')

nelibla avatar May 15 '19 09:05 nelibla

We don't support that particular syntax for nested fields right now

kiranchitturi avatar Mar 15 '20 05:03 kiranchitturi