cg3 icon indicating copy to clipboard operation
cg3 copied to clipboard

WIP: JSONL format

Open reynoldsnlp opened this issue 7 months ago • 3 comments

I have this working to the best of my ability but I'm not great at C++, so a thorough review is recommended. I left several TODO comments to bring attention to parts that I was unsure about.

reynoldsnlp avatar Apr 29 '25 07:04 reynoldsnlp

Overall looks correct.

Unfortunately, Boost.JSON is too new. It was added in 1.75.0, which is not available in all supported Debian/Ubuntu. Of the JSON dev libraries that are supported, I'd say rapidjson (apt-get install rapidjson-dev) is the best choice. nlohmann is also available, but rapidjson is much faster.

TinoDidriksen avatar Apr 29 '25 11:04 TinoDidriksen

I should have thought to look at that. I knew that boost was already a dependency, so I just went with that. I'll try to refactor using rapidjson.

reynoldsnlp avatar Apr 29 '25 14:04 reynoldsnlp

  • [x] Finalize abbreviations (change json schema and c++ code, including --help text specs)
  • [x] Add testing (if nothing else validate output using the JSON schema)
  • [x] Handle STREAMCMD as {"cmd": "EXIT"}
  • [x] Handle initial text line as {"z": "First line"}

reynoldsnlp avatar Apr 29 '25 15:04 reynoldsnlp

Assuming that converting between Apertium and JSONL is a common use case, I added Apertium roundtrip (aJ, then jA) testing to the validate_json.py script, and it is failing almost all of them because of whitespace between cohorts. To fix that, do we need to add Cohort->wblank to the JSONL format?

I've done my best with this, but if @TinoDidriksen @mr-martian or @unhammer are interested in taking over to fix my mistakes and/or put any finishing touches this feature, I feel like I'm not going to get it much better than it is at this point. I'm going to mark the PR as ready for review, and you are welcome to make edits.

reynoldsnlp avatar May 12 '25 23:05 reynoldsnlp

I forgot to mention that there is still a funny issue with re-ordering deleted readings, but cg-conv -c -<any target fmt> seems to be re-ordering deleted readings, too, so I assumed that's not important.

reynoldsnlp avatar May 12 '25 23:05 reynoldsnlp

Sure, I can take it from here.

TinoDidriksen avatar May 13 '25 11:05 TinoDidriksen