kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Serialization merge

Open GreyCat opened this issue 1 year ago • 14 comments

Following up #27, @generalmimon done tremendous job at implementing serialzation for some languages. However, it's not yet merged into master, thus so far these live into separate "serialization" branches:

  • https://github.com/kaitai-io/kaitai_struct_compiler/tree/serialization
  • https://github.com/kaitai-io/kaitai_struct_tests/tree/serialization
  • https://github.com/kaitai-io/kaitai_struct_java_runtime/tree/serialization
  • https://github.com/kaitai-io/kaitai_struct_python_runtime/tree/serialization

It's documented in https://doc.kaitai.io/serialization.html, but ultimately our goal obviously would be not supporting two separate forks, but having one that does everything.

I wanted to start making itemized list of tasks of what exactly needs to be done to merge these back to master.

@generalmimon, can I ask you to share your thoughts on this?

GreyCat avatar Jul 27 '23 12:07 GreyCat

It's documented in https://doc.kaitai.io/serialization.html, but ultimately our goal obviously would be not supporting two separate forks, but having one that does everything.

Of course. I'd love to do this - generally speaking, serialization in Python and Java is pretty much finished at this point and I believe the serialization branches are mostly ready for the merge. Let me assess the situation in individual repos:

  • [x] kaitai_struct_tests: The aggregate/convert_to_json script (and related XML "parsers") has to be changed to read the test name not only from the test method name, but to take the combination of test class name + test method name as a unique test.

    This is needed mainly because in both Java and Python, serialization is tested primarily via a generic test method called testReadWriteRoundtrip / test_read_write_roundtrip, which is defined in the CommonSpec abstract base class. Unfortunately, from what I've seen, the convert_to_json script currently takes only the test method name as the name of the test, so different testReadWriteRoundtrip tests from all test formats are mistaken for just one test in the output ci.json file.

  • kaitai_struct_tests: Another thing to note here (but there's probably no need to do anything about it before the merge, it can be done after that) is that I've added ~60 new test formats during the serialization work, so these should eventually be generated from KST / ported to other languages. But I think our testing system doesn't care about additional test formats / KST specs without language-specific specs, so this can be done gradually for individual languages (no need to do it at once).

  • kaitai_struct_compiler: Since the serialization branch is up-to-date with the 0.10 tag, I don't anticipate almost any problems with the merge. The only feature I'm personally afraid of are the "zero-copy" substreams (https://github.com/kaitai-io/kaitai_struct/issues/44), which I haven't thought of at all in the context of serialization. I think if we want to get serialization merged to master soon and avoid problems, for now I'd let the --read-write option imply --zero-copy-substream false (https://github.com/kaitai-io/kaitai_struct_compiler/commit/690af1b36d6fea4266d06dc58f1b5482524bdfe7).

    Or perhaps (but I don't know much about the substream implementation), the substreams could be only used for parsing, but serialization would still use the old way with copying for now.

  • kaitai_struct_java_runtime, kaitai_struct_python_runtime: Probably the least problematic. They could be perhaps merged right now because they actually only add APIs and the changes are independent on compiler changes, so the existing functionality should work as it did (at least I think).

generalmimon avatar Jul 27 '23 13:07 generalmimon

Also I've just extended the description of --read-write a bit to match the current situation, hopefully that's OK (JavaMain.scala:60-62):

      opt[Unit]('w', "read-write")  action { (x, c) =>
        c.copy(runtime = c.runtime.copy(readWrite = true, autoRead = false))
      } text("generate read-write support in classes (implies --no-auto-read, Java and Python only, default: read-only)")

generalmimon avatar Jul 27 '23 14:07 generalmimon

Are there some guidances how to support serialization for other languages? Especially interested in the JS one... Happy to contribute where possible.

Letrab avatar Jul 28 '23 12:07 Letrab

I'm taking a look at

Slowly getting familiar with the tests submodule. Limiting myself to the 'java' flows.

Propagating class and method names into TestResult is trivial and we can get this kind of ci.json (actual output):

...
  "TestBcdUserTypeBe#testBcdUserTypeBe": {
    "status": "passed",
    "elapsed": 0.001
  },
  "TestBcdUserTypeBe#testReadWriteRoundtrip": {
    "status": "passed",
    "elapsed": 0.006
  },
...

However, tagging the results with is_kst: true is not so trivial. kst_adoption.log currently relies on congruence between filename and test name. Since serialization departs from this congruence, we'll have to tweak some kst_adoption machinery.

I was thinking we could generate kst_adoption_spec.log and kst_adoption_specwrite.log, instead of a single kst_adoption.log, by doing Dir.glob('spec/java/src/**/specwrite/Test*.java') and Dir.glob('spec/java/src/**/spec/Test*.java') in kst-adoption-report. Then we give TestResult an is_specwrite: Boolean field that is derived from the presence of specwrite in the FQN of the class name, and now convert_to_json#add_kst_adoption can tag accordingly.

What do you think?

izzyreal avatar Jul 29 '23 15:07 izzyreal

I haven't looked at the full pipeline, but since serialization affects test names in several ways, we may want to anticipate the implications for the order in which they appear on https://ci.kaitai.io/.

Moreover, perhaps I could start working on kaitai_ci_ui to prepare it for adding the write test results (@GreyCat mentioned this on Gitter). I've created a ticket to refine the layout in https://github.com/kaitai-io/kaitai_ci_ui/issues/44.

izzyreal avatar Jul 29 '23 16:07 izzyreal

I've took liberty at merging kaitai_struct_java_runtime — see https://github.com/kaitai-io/kaitai_struct_java_runtime/commit/e34e5a9b4d3e02a323fdbb000b61d0dd0fb9f776. A few very minor conflicts, otherwise all read-only tests work just as before.

GreyCat avatar Jul 30 '23 11:07 GreyCat

Also merged https://github.com/kaitai-io/kaitai_struct_python_runtime — see https://github.com/kaitai-io/kaitai_struct_python_runtime/commit/704995acdb2f114e15809cf5f469e3a80a2dddad.

No changes in tests, no conflicts.

GreyCat avatar Jul 30 '23 15:07 GreyCat

Also merged https://github.com/kaitai-io/kaitai_struct_python_runtime — see kaitai-io/kaitai_struct_python_runtime@704995a.

Seems a bit weird to me to do this as a fast-forward merge, but OK.

generalmimon avatar Jul 30 '23 21:07 generalmimon

I've just resolved the merge conflicts on branch https://github.com/kaitai-io/kaitai_struct_compiler/tree/serialization.

generalmimon avatar Jul 31 '23 23:07 generalmimon

For tracking purposes — PRs to discuss the merge:

GreyCat avatar Aug 01 '23 20:08 GreyCat

How is it going? I wanner to do sth for go.

yywing avatar Nov 22 '23 02:11 yywing

There is a question: we should imply function to calc length? image

meta:
  id: value_instances
seq:
  - id: len_data_raw
    type: u1
  - id: data
    size: len_data
instances:
  len_data:  # for read
    value: len_data_raw - 3
  calc_length:  # for write
    value: data.length + 3

calc_length for write.

ValueInstances r = new ValueInstances();

r.setData(new byte[] { 1, 2, 3, 4, 5 });
r.setLenDataRaw(r.calcLength());
System.out.println(r.lenData()); // => 5
  1. array length reduce: we cannot range array.
seq:
  - id: length
    type: u2be
  - id: params
    type: params
    size: len_params
instances:
  len_params:
    value: length - 2
  calc_length:
    value: params.calc_length + 2
types:
  params:
    seq:
      - id: params
        type: parameter
        repeat: eos
    instances:
      calc_length:
        value: params.reduce(0, r + i.calc_length)

  parameter:
    seq:
      - id: length
        type: u2be      
      - id: data
        size: len_data
    instances:
      len_data:
        value: length - 2
      calc_length:
        value: data.length + 2

yywing avatar Dec 11 '23 12:12 yywing

The implementation of Golang is almost complete, but there have been some breaking changes and some areas that do not meet my requirements. If no one responds to my questions, then I will have to implement it according to my own ideas. As a result, there is a high probability that it won't be merged into the master branch.

yywing avatar Dec 11 '23 12:12 yywing

What is the status here? Thanks for everyone working on this. I would like to use Kaitai for numerous Gadgetbridge gadgets, and serialization is paramount there.

Is there any chance of serialization support in the main java package? Maybe even in the web IDE?

dakhnod avatar May 10 '24 08:05 dakhnod