kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Serialization

Open KOLANICH opened this issue 7 years ago • 88 comments

You have deserialization. How about serialization?

KOLANICH avatar Sep 13 '16 15:09 KOLANICH

We've slowly discussing this issue, but it's obviously not as easy as it looks like. Probably it will be a major feature for something like second major version (v2.0 or something like that), so it'll eventually be there, but don't hold your breath for it.

GreyCat avatar Sep 13 '16 16:09 GreyCat

We've slowly discussing this issue, but it's obviously not as easy as it looks like.

What are the troubles?

KOLANICH avatar Sep 13 '16 17:09 KOLANICH

It is pretty easy for simple fixed (C-style) structures. However, as soon as you start using instances that bind certain values to offsets in the stream, it becomes much more complex. A very simple example:

seq:
  - id: num_files
    type: u4
  - id: files
    type: file
    repeat: expr
    repeat-expr: num_files
types:
  file:
    seq:
      - id: file_ofs
        type: u4
      - id: file_size
        type: u4
    instances:
      body:
        pos: file_ofs
        size: file_size

This .ksy describes very simple file index structure which consists of (num_files) of (file_ofs, file_size) pairs. Each pair describes a "file", which can be accessed by an index using something like:

file_contents = archive.files[42].body

However, adding a file to such archive is a challenge. Ideally, one would want to do something like:

archive.files[42].body = file_contents

This should set relevant bound file_size automatically to accomodate length of assigned file_contents and, what's much more complex, assign file_ofs somehow. This is not an easy task: KS has no innate knowledge on how to manage unmapped space in the stream, is it limited or not, should you find some unused spot and reuse it or just expand the stream and effectively append file_contents to its end, etc.

What's even harder is that in many cases (i.e. archive files, file systems, etc) you don't want rewrite whole file, but just do some changes (i.e. appending new file to the end of archive, or reusing some pre-reserved padding, or something like that).

Another, may be even simpler example: if you read PNGs, you don't care about checksums. When you write PNGs, you have to generate proper checksums for every block — thus we need block checksumming algorithms and some way to bind it to block.

GreyCat avatar Sep 13 '16 18:09 GreyCat

what's the problem to deserialize them, edit, serialize back and then write?

KOLANICH avatar Sep 13 '16 19:09 KOLANICH

What exactly do you refer as "them"? File archive example? PNG checksums?

GreyCat avatar Sep 13 '16 19:09 GreyCat

What exactly do you refer as "them"? File archive example? PNG checksums?

Almost anything, including files and checksums. In your example repeat-expr: num_files and id: num_files type: u4 means that "files has length num_files". When you deserialize, you read num_files, create files array of length num_files and read there num_files structures. When you serealize you need the inverse: update "num_files" with length and then write. What if "num_files" is complex expression we cannot derive automatically? We shift responsibility to a user and to be able serialize he must to provide both forward and inverse mappings for such expressions, except simple cases when they can be derived automatically.

Now some ideas how to implement this. When processing a KS file it should build a graph of object dependencies, in your case num_files <-> files[].length. Then we apply rules. In this case "scalar <-> array.length" -> "scalar <- array.length" which results in num_files <- files[].length. We can say that every field has some absolute and actual strengh and that edges can go only in direction of non-increasing actual strengh and that actual strengh of node is max(max(actual strengh of neighbours), absolute strengh). This way we transform the graph into tree and reduce the number of free variables in the case of equal strength. When you need to serialize you process the description member by member, launch lazy evaluation according to the graph and write them.

If you want to minimize count of writing operations you store both versions in memory, create diff, and trynot to touch parts not touched by diff.

KOLANICH avatar Sep 13 '16 19:09 KOLANICH

What if "num_files" is complex expression we cannot derive automatically?

Exactly my thoughts. And actually even that requires us to create some sort of inverse derivation engine. For example, if we have a binding:

    - id: body_size
      type: u4
    - id: body
      size: body_size * 4 + 2

and we update body, we should update body_size, assigning (body.size() - 2) / 4 there. If there are some irreversible bindings (i.e. modulo, hashes, etc) - then, at the very least, we need to detect that situation and allow some extra syntax to make it possible for user to set these inverse dependencies manually.

We can say that every field has some absolute and actual strengh and that edges can go only in direction of non-increasing actual strengh and that actual strengh of node is max(max(actual strengh of neighbours), absolute strengh).

I'm sorry, but I fail to understand that. Could you rephrase it, provide some examples, or explain why it is helpful, i.e. which problem we're trying to solve here?

If you want to minimize count of writing operations you store both versions in memory, create diff, and trynot to touch parts not touched by diff.

The point is that we need to add some extra syntax (or API, or something) to make it possible to do custom space allocation or stuff like that. For example, if you're editing a file system, you can't just always append stuff to the end of it: block device usually has a finite capacity and sooner or later you'll exhaust that and your choice would be to pick and reuse some free blocks in the middle of the block device.

You seem to have good ideas on implementation, would you want to join and help implementing it? Basically anything will help, i.e. .ksy syntax ideas, API ideas, tests for serialization, compiler ideas & code, etc, etc.

GreyCat avatar Sep 13 '16 20:09 GreyCat

And actually even that requires us to create some sort of inverse derivation engine.

Or take some library. Symolic evaluation is rather studied area of math with lots of paper written. I haven't studied it, so it is possible tha some of the ideas I've mentioned were discussed in them.

I'm sorry, but I fail to understand that. Could you rephrase it, provide some examples, or explain why it is helpful, i.e. which problem we're trying to solve here?

Let we have an array and a size_t scalar - number of elements in array. What is main in the couple? What defines it entirely? Data in the array do, number is needed only to allow us to read the array correctly. For example in c-strings you don't need number because the convention is to terminate by \0. Add more data to array and you'll have to increase array capacity which means you will have to increase number in order to read it (and everything after it) correctly. So let array.length have strentgh 2, and scalar 1. Then you have link "files[].length <-> num_files". Let we have another array "filenames[]" and its capacity num_files-2, let we also have foo and bar fields with something. Connections : filenames[].length <-> num_files, num_files <-> foo, bar <-> foo and files[].length <-> foo.

Now we start processing. Let show strenghs in brackets, absolute first. 1 files[].length (2,0) 2 files[].length files[].length (2,2) itself is processed, go to edges 3 files[].length (2,2) <-> num_files(1,0) 4 files[].length (2,2) <-> num_files(1,1) 2>1 so removing reverse edge and setting to 2 5 files[].length (2,2) -> num_files(1,2) 6 files[].length (2,2) <-> foo(1,0) 7 files[].length (2,2) <-> foo(1,1) 8 files[].length (2,2) -> foo(1,1) 9 files[].length (2,2) -> foo(1,2) 10 num_files(1,2) <-> filenames.length (2,0) 11 num_files(1,2) <-> filenames.length (2,2) equal, this saves both edges. 12 num_files (1,2) <-> foo(1,2) 13 foo(1,2) <-> bar(1,0) 14 foo(1,2) <-> bar(1,1) 15 foo(1,2) -> bar(1,1) 16 foo(1,2) -> bar(1,2)

then we need to serialize and read config 1 first goes the num_files it has incoming edge files.length. it is only incoming edge. 2 files[].length is already evaluated, take its value, eval. num_files. 3 it has 2 bidi edges, filenames[].length and foo evaluate them the same way and check if they match to value of filenum. 4 serialize and write 5 continue to the rest of fields.

explain why it is helpful, i.e. which problem we're trying to solve here?

it determines the order we should evaluate expressions and what expressions depends on what and helps to find conflicts.

and I think you should really read something about symbolic evaluation (I haven't, ideas above are just adhoc thoughts, maybe there are better approaches to it).

You seem to have good ideas on implementation, would you want to join and help implementing it?

Sorry, no. Maybe i'll send you some ideas or code later, but I can't be a permanent member of this project.

Basically anything will help, i.e. .ksy syntax ideas, API ideas, tests for serialization, compiler ideas & code, etc, etc.

OK.

KOLANICH avatar Sep 13 '16 23:09 KOLANICH

Add more data to array and you'll have to increase array capacity which means you will have to increase number in order to read it (and everything after it) correctly. So let array.length have strentgh 1, and scalar 0.

Ok, then what shall we do in case of the following:

seq:
  - id: num_objs
    type: u4
  - id: headers
    type: header
    repeat: expr
    repeat-expr: num_objs
  - id: footers
    type: footer
    repeat: expr
    repeat-expr: num_objs

This implies that headers[] and footers[] shall always have the same number of objects. How are the strengths are assigned in this case and how do we enforce that they have equal number of objects?

GreyCat avatar Sep 13 '16 23:09 GreyCat

1 see above example 2 throwing an exception, of course

ps graph evaluation is done by KS compiler runtume checks are done by generated code

and I fixed priorities to match the lines I made supposing strenghs were 1 and 2

KOLANICH avatar Sep 13 '16 23:09 KOLANICH

I've commited very basic PoC code that demonstrated seq serialization in Java. It is available in distinct "serialization" branches. To test it, one'll need:

Obviously, only a few tests were converted, and, to be frank, now only a very basic set of types is supported. Even strings are not implemented right now. Testing is very basic too, one can run ./run-java and see that two packages are ran: spec is "normal", reading tests, and specwrite are writing tests.

I'd really love to hear any opinions on the API (both runtime & generated), Java implementation (it really pain in the ass, as ByteBuffer does not grow, and you have to preallocate the array, or probably reimplement everything twice with something that grows), etc.

GreyCat avatar Mar 24 '17 08:03 GreyCat

Strings work, repetitions work, enums work, things are slowly coming to reality :)

GreyCat avatar Mar 25 '17 08:03 GreyCat

Serialization progresses slowly. Basic in-stream user types and processing on byte types work, various fancy byte type stuff like terminator and pad-right works, etc, etc.

I've got an idea that might be very simple to implement.

Step 1: manual checks

Generate _read, _write and _check methods. _check runs all the internal format consistency checks to ensure that stuff that will be written will be read back properly. For example:

seq:
  - id: len_of_1
    type: u2
  - id: str1
    type: str
    size: len_of_1 * 2 + 1

would generate:

    public void _read() {
        this.lenOf1 = this._io.readU2le();
        this.str1 = new String(this._io.readBytes(lenOf1() * 2 + 1), Charset.forName("ASCII"));
    }
    public void _write() {
        this._io.writeU2le(this.lenOf1);
        this._io.writeBytes((this.str1).getBytes(Charset.forName("ASCII")));
    }
    public void _check() {
        if (this.str1.bytes().size() != lenOf1() * 2 + 1)
            throw new FormatConsistencyError("str1 size", this.str1.bytes().size(), lenOf1() + 3);
    }

To use this properly, one must manually set both lenOf1 and str1:

r.setStr1("abcde");
r.setLenOf1(2);
r._check(); // should pass, so we're clean to take off
r._write(); // should write consistent data that's guaranteed to be readable back

Step 2: dependent variables

We declare some fields as "dependent", and mark them up in ksy:

seq:
  - id: len_of_1
    type: u2
    dependent: (str1.to_b("ASCII").size - 1) / 2
  - id: str1
    type: str
    size: len_of_1 * 2 + 1

This means that len1 becomes a read-only variable, setLenOf1 setter won't be generated. Instead, it would generate slightly different _write:

    public void _write() {
        this.lenOf1 = (str1().getBytes(Charset.forName("ASCII")).size() - 1) / 2;
        this._io.writeU2le(this.lenOf1);
        this._io.writeBytes((this.str1).getBytes(Charset.forName("ASCII")));
    }

Obviously, using this boils down to single r.setStr1("abcde");.

Any comments / ideas / etc?

GreyCat avatar Mar 27 '17 20:03 GreyCat

Any comments / ideas / etc?

  1. move evaluation of expression into a separate method
  2. use check and write in generic form without explicit expressions in it, but with a method from 1)
  3. Why not to use value instead of dependent?
  4. why do we use .to_b("ASCII").size? String encoding is known, so why not just .size?
  5. this dependent are ugly, it'd be nice to eliminate them, but we need a decision what kind of expressions should be resolved automatically. I guess linear ones should be enough. Another question is how to solve them. There is exp4j, for parsing and storage, but it'll require some code to build a simple symbolic gaussian elimination solver over it. If we wanna write in python, here are docs for lib wrapping multiple SMT solvers: https://github.com/angr/angr-doc/blob/master/docs/claripy.md

KOLANICH avatar Mar 28 '17 08:03 KOLANICH

  • move evaluation of expression into a separate method
  • use check and write in generic form without explicit expressions in it, but with a method from 1)

You mean, something like that?

    public void _read() {
        this.lenOf1 = this._io.readU2le();
        this.str1 = new String(this._io.readBytes(_sizeStr1()), Charset.forName("ASCII"));
    }
    public void _write() {
        this._io.writeU2le(this.lenOf1);
        this._io.writeBytes((this.str1).getBytes(Charset.forName("ASCII")));
    }
    public void _check() {
        if (this.str1.bytes().size() != _sizeStr1())
            throw new FormatConsistencyError("str1 size", this.str1.bytes().size(), lenOf1() + 3);
    }
    public int _sizeStr1() {
        return lenOf1() * 2 + 1;
    }

Does it bring any benefits? It won't really simplify generation (probably on the contrary), and we'll need to invent tons of names for all that size, if, process, etc expressions.

  • Why not to use value instead of dependent?

Naming is, of course, still to be discussed. One argument against value I have is that value is already used for reading in value instances,

why do we use .to_b("ASCII").size? String encoding is known, so why not just .size?

Using size on a string will give out a length of string in characters. If you'll put some non-ASCII into that string, proper .to_b("ASCII").size conversion will give you an exception, while just taking "number of bytes = number of characters" will give you corrupted data.

I guess we could try to do some sort of .bytesize method for strings taken verbatim from format definition, where "encoding is known", to save retyping the encoding name. However, it still won't work on modified strings, i.e. it's possible to implement str1.bytesize and it won't work with (str1 + 'x').bytesize (as the latter is CalcStrType, which lacks any source encoding info by design).

  • this dependent are ugly, it'd be nice to eliminate them

First of all, you can't really eliminate them completely in any case. Some functions are just irreversible, and in some cases you'll have more free variables than constraints. For example:

seq:
  - id: a
    type: u1
  - id: b
    type: u1
  - id: my_str
    type: str
    size: a + b

Even if you have byte size of my_str, you can't set both a and b automatically. Reversing stuff automatically would be more like a syntactic sugar feature, just to save from typing boring stuff where it's possible. In fact, I heavily suspect that we'll cover 95% of cases with very crude logic like size: a => a = attr.size.

GreyCat avatar Mar 28 '17 09:03 GreyCat

value is already used for reading in value instances,

Yes, that's why I've chosen it. It has the same semantic: deriving a value of a struct member from other ones makes it some kind of an instance, but tied to offset.

we could try to do some sort of .bytesize

size attribute in a .ksy means size in bytes. So I see no reason it to mean anything else in expression language.

First of all, you can't really eliminate them completely in any case.

Yes, we have already discussed this.

Even if you have byte size of my_str, you can't set both a and b automatically.

Since they are of equal strentgh. This should cause ksc to throw a warning about undefined behavior.

Reversing stuff automatically would be more like a syntactic sugar feature, just to save from typing boring stuff where it's possible.

Not only. One expression can contradict another one, it can cause nasty errors ot can be used as a backdoor. Another side is that we (humans) don't know exact expressions ahead of time. So I propose the following: 1 to have a syntax to provide a manual expression 2 missing expressions are generated by compiler and inserted into a ksy with another type of expressions 3 human examines ksy output for errors and malicious code 4 verified output is used to generate actual code 5 we would sometimes want to

  • regenerate all non-manual expressions in a ksy
  • check that expressions don't contradict each other

In fact, I heavily suspect that we'll cover 95% of cases with very crude logic like size: a => a = attr.size.

Maybe.

KOLANICH avatar Mar 28 '17 10:03 KOLANICH

I would like to play with the serialization branch...

Is there a pre-built compiler and Java runtime of the serialization branch available? If not, that's ok, I'll setup a Scala build environment.

ixe013 avatar Oct 05 '17 20:10 ixe013

@ixe013 There are no pre-built packages for such experimental branches, but it's relatively straightforward to try it yourself. You don't really need anything beyond sbt — it would download all the required stuff (including Scala compiler and libraries of the versions needed) automatically. See dev docs — it's literally one command to run, like sbt compilerJVM/universal:packageBin — et voila.

GreyCat avatar Oct 05 '17 23:10 GreyCat

I was able to build a Read-write version of my ksy, however the generaed code does not compile as the Reader reads from Int and stores them in Enums as Long (Int->Long) and the _writes() try to write the Enum Long as Int (Long->Int cause error).

I have these errors: The method writeBitsInt(int, long) is undefined for the type KaitaiStream The method writeU1(int) in the type KaitaiStream is not applicable for the arguments (long) The method writeU2be(int) in the type KaitaiStream is not applicable for the arguments (long)

glenn-teillet avatar Oct 10 '17 17:10 glenn-teillet

I also have errors in every constructor, where it tries to assign _parent received as a KaitaiStruct in the constructor signature to the _parent without casting to the more specific type.

Type mismatch: cannot convert from KaitaiStruct to KaitaiStruct.ReadWrite

glenn-teillet avatar Oct 10 '17 17:10 glenn-teillet

@glenn-teillet There's still lots of work to do, it's nowhere near production-ready. Before trying to compile read-life ksy files with lots of complicate details, probably we should spend some time getting read-write test infrastructure to work, and, probably, porting that older serialization branch code to modern codebase.

GreyCat avatar Oct 10 '17 17:10 GreyCat

Some suggested definitions for the serialization spec:

  • Serialization. Let we have a binary format f, a set of sequences of bits FS, its subset of sequences of bytes making a valid format FS_f, a set of object-oriented Turing-complete programming languages PL, a set of valid Kaitai Struct definitions KSY, including the subset of definitions for the format f KSY_f, and the KS compiler KSC : PL × KSY → (PSC, SSC), where PSC: FS → O is a set of a parsing programs, SSC: O → FS is a set of serializing programs and ssc_{ksy_f}(psc_{ksy_f}(s)) ≡ s ∀s ∈ FS_f, ∀ksy_f ∈ KSY_f, ∀pl ∈ PL, KSC(pl, ksy_f)=(psc_{ksy_f}, ssc_{ksy_f}). To be practically usable there should be a way to create an o= psc_{ksy_f}(s) programmatically without doing any parsing of actual bit string s.

Serialization is the part of KSC producing serialization programs.

  • Internal representation are the objects created by KSC-generated code in program's runtime.
  • Finalization is a process of transforming the internal representation that way, that only trivial transformations are left to be done to create a serialized structure.
  • An expression is the mapping of a subset of internal representation variables called expression arguments to another variable called expression output.
  • A reverse expression is an expression mapping an original expression's output back to its arguments with respect to the current state of the internal representation (including the arguments).
  • Trivial transformations are the ones not involving computing any expressions. Examples of trivial transformations are endianess conversions and bit shifts induced by using bit-sized fields.

Comments?

KOLANICH avatar Oct 23 '17 08:10 KOLANICH

This is a very interesting subject. What is the current status of this? Will this be pursued in the future? It would be very good if you could use serialization. Especially for binary formats based on bit's this would be a great relief. I hope that the issue will be pursued further and that we can also benefit from it in the future.

FSharpCSharp avatar Dec 22 '17 10:12 FSharpCSharp

No updates here, latest comments like https://github.com/kaitai-io/kaitai_struct/issues/27#issuecomment-334616207 still reflect current state of affairs. Help is welcome :)

GreyCat avatar Dec 22 '17 12:12 GreyCat

Definitely a complex issue. For my part, an implementation that generates write methods for each field with a simple type (uint32_t, etc for C++) to a stream would go a long way - I could do the object graph traversal and "finer parts" myself.

cryptocode avatar Jan 10 '18 18:01 cryptocode

@cryptocode If you're interested in C++ support, then I'd suggest to start with implementing related C++ runtime methods. See this Java runtime branch for generally proposed API.

GreyCat avatar Jan 10 '18 18:01 GreyCat

@KOLANICH Did you consider a relaxed definition for the serialization: psc_{ksy_f}(ssc_{ksy_f}(o)) ≡ o ∀o ∈ O_f, ∀ksy_f ∈ KSY_f, where O_f is a space of structure values valid with respect to format f? This definition enables KSC to ignore internal structure relations and makes KSC user responsible to ensure that ∀o ∈ O_f. In construct there is a field Computed which enables to compute some values during serialization using context. Unfortunatelly, this is Python specific and cannot be directly implemented in any PL. BTW. Why do you require PL to be Turing-complete?

mbrudka avatar Apr 30 '18 14:04 mbrudka

where O_f is a space of structure values valid with respect to format f?

I don't quite understand what you mean under this. Do you mean O_f be a set of internal representations of a format without self-consistency? Further in this message I assume that you mean this.

There are at least 2 issues with your definition.

1

psc_{ksy_f}(ssc_{ksy_f}(o)) ≡ o ∀o ∈ O_f, ∀ksy_f ∈ KSY_f

This relaxed definition allows a degenerate case where the serializer dumps a serialized internal representation into any other format except the original one and the parser checks if the binary string is a serialized internal representation and deserializes it if it is, otherwise parses. Or smuggles some additional info in the fields without an id. In this case the result will not belong to FS_f and that's why be unreadable by external programs. We surely wanna avoid it. So a binary string should be primary and the internal representation should be secondary.

2

This definition enables KSC to ignore internal structure relations and makes KSC user responsible to ensure that ∀o ∈ O_f.

In fact the goal is to make KSC be responsible for enforcement that the output belongs to FS_f and make the serializer derive and fix the internal representation to be self-consistent. This should allow automatic processing without any human intervention. We just feed ksys and blobs and get the IRs, we process IRs fully automatically, then we dump IR and get a correct self-cinsistent file which can be processed with other programs.

In construct there is a field Computed which enables to compute some values during serialization using context. Unfortunatelly, this is Python specific and cannot be directly implemented in any PL. BTW.

I prefer the reverse expressions for computing when serializing to be derived by the compiler from all other expressions. I have used some computer algebra systems (including Open Source ones, but Wolfram Mathematica does far better than anything I saw in Open Source) and saw them being able to find analytic solutions for equations. I guess the expressions we are dealing with in ksys are far simpler and in most cases can be derived automatically. But if they cannot, or the derived ones are inefficient, we need some syntax to explicitly provide them.

Why do you require PL to be Turing-complete?

Because non-turing-complete languages lack loops/recursion which are needed to process structures with repeat-*, so most of structs are useless for them (and I guess that KSY may be Turing-complete itself but I have no proof for it). Most of programming languages are Turing-complete too. So just for simplicity.

KOLANICH avatar May 02 '18 16:05 KOLANICH

@KOLANICH The rest of your reply suggest that you understand what is O_f. I am really glad you see the differences between definition as some may think they are equivalent. Let me then not define it precisely, but please do it for the sake of clarity and completeness if you find my proposition interesting. If about issues you mentioned:

  1. In general your assumption about FS_f is still at force, thus the definition does not allow to dump structures into arbitrary strings. Certainly, the definition allows to create frames which are not consistent and may be not recognised by other applications eg. frame with a bad checksum. This is why we assume that users is responsible to ensure that ∀o ∈ O_f.
  2. The idea to enforce the semantic consistency on KSC level though appealing is rather an illusion. The consistency is the responsibility of the business logic, but not parsers or builders. It is better to avoid it in KS as it will delay the serialisation feature, then will be hard for users and finally will be continuously fixed to meet new requirements. We just need to accept that some frames will not be consistent..

I am not exactly sure what you mean by reverse expressions - is this reverse polish notation? But I hope that the serialisation will not depend on the implementation of analytic solution solvers as they are really much bigger piece of software that KS.

In general you are probably right to require PL to be Turing complete. Nevertheless the assumption is not necessary as long as you do not use it in any formal proof or in the implementation.

Concluding: increasing responsibilities lead many software into a dead-end street. Extensive consistency validation is such path in my opinion. KS may avoid it by sticking directly for transforming bits into structures and (hopefully soon) structures into bits.

mbrudka avatar May 02 '18 18:05 mbrudka

I am not exactly sure what you mean by reverse expressions

A reverse expression is an expression mapping an original expression's output back to its arguments with respect to the current state of the internal representation (including the arguments).

The consistency is the responsibility of the business logic, but not parsers or builders.

Then we need this feature be disableable. I guess in both runtime and compile-time separately: in compile time to cut costs on building reverse expressions and to build smaller code, in runtime to use the code with reverse expressions in more complex cases like fuzzing.

It is better to avoid it in KS as it will delay the serialisation feature, then will be hard for users and finally will be continuously fixed to meet new requirements.

Have you read the thread? Our roadmap to the serialization is to implement and debug manual reverse expressions first. When we have it working it would be immediately useful - anyone would be able to start using it, but it may be troublesome: there may be lots of variables in formats and it's very labour-consuming to construct and maintain them for every field and instance. The next step, the most tricky one, is to implement autoderivation of them so we wouldn't have to write and maintain them explicitly in most cases.

But I hope that the serialisation will not depend on the implementation of analytic solution solvers as they are really much bigger piece of software that KS.

z3 is not very big. And I hope that all the heavy lifting can be done in compilation phase.

Nevertheless the assumption is not necessary as long as you do not use it in any formal proof or in the implementation.

Yes, we have not stated that parsers, serializers and ksy itself exist ;) But if we need them exist for every valid ksy we have to assume that the language they are transpiled to is Turing-complete.

KOLANICH avatar May 02 '18 20:05 KOLANICH