WIP: JSONL format
I have this working to the best of my ability but I'm not great at C++, so a thorough review is recommended. I left several TODO comments to bring attention to parts that I was unsure about.
Overall looks correct.
Unfortunately, Boost.JSON is too new. It was added in 1.75.0, which is not available in all supported Debian/Ubuntu. Of the JSON dev libraries that are supported, I'd say rapidjson (apt-get install rapidjson-dev) is the best choice. nlohmann is also available, but rapidjson is much faster.
I should have thought to look at that. I knew that boost was already a dependency, so I just went with that. I'll try to refactor using rapidjson.
- [x] Finalize abbreviations (change json schema and c++ code, including --help text specs)
- [x] Add testing (if nothing else validate output using the JSON schema)
- [x] Handle STREAMCMD as {"cmd": "EXIT"}
- [x] Handle initial text line as {"z": "First line"}
Assuming that converting between Apertium and JSONL is a common use case, I added Apertium roundtrip (aJ, then jA) testing to the validate_json.py script, and it is failing almost all of them because of whitespace between cohorts. To fix that, do we need to add Cohort->wblank to the JSONL format?
I've done my best with this, but if @TinoDidriksen @mr-martian or @unhammer are interested in taking over to fix my mistakes and/or put any finishing touches this feature, I feel like I'm not going to get it much better than it is at this point. I'm going to mark the PR as ready for review, and you are welcome to make edits.
I forgot to mention that there is still a funny issue with re-ordering deleted readings, but cg-conv -c -<any target fmt> seems to be re-ordering deleted readings, too, so I assumed that's not important.
Sure, I can take it from here.