solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-15152: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin

Open epugh opened this issue 4 years ago • 3 comments

Description

Export tool says it uses json, but it's actually a json lines format. It ignores anonymous and nested docs.

Solution

  • Tweaked the writer to properly handle anonymous and regular nested docs when exporting data.
  • Renamed the existing json format to jsonl, and introduced a proper json format.
  • Introduce explicit DocSinks per format, json, jsonl, and javabin.
  • Create new configsets for testing under nested/anonymous and nested/regular for testing.
  • added nested products example that was used in the Ref guide to the example/exampledocs/office_products.json.
  • Changed sample_techproducts_configs to used explicit nested docs, not anonymous nested docs, and then fixed various tests that assumed anonymous children. This was tough!
  • Updates to the Ref Guide.

Now, with the json format you can export and then reimport the Solr docs, including with child docs!

Tests

I've added a new TestExportToolWithNestedDocs, and extended the existing TestExportTool tests. The setup for the tests was quite different, so I didn't make them all one file. I've updated the existing tests that

Checklist

Please review the following and check all that apply:

  • [X ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [ X] I have created a Jira issue and added the issue ID to my pull request title.
  • [ X] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [ X] I have developed this patch against the main branch.
  • [ X] I have run ./gradlew check.
  • [ X] I have added tests for my changes.
  • [ X] I have added documentation for the Reference Guide

epugh avatar Mar 19 '21 00:03 epugh

There is a lot of good commentary on https://github.com/apache/lucene-solr/pull/2356 that should be reviewed!

epugh avatar Mar 19 '21 00:03 epugh

I'm working up the energy to tackle this again!

epugh avatar Aug 17 '22 11:08 epugh

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Feb 27 '24 00:02 github-actions[bot]