scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

Term and LeafAnswer serde serialization

Open bakaq opened this issue 1 year ago • 8 comments

As a lot of things moved around, I thought making a new PR would be easier than rebasing #2493. This also gives us a clean slate to discuss. The old PR was getting a bit too big, and Github is notoriously bad for big threads of discussions in issues/PRs.

This basically gets to the same point that #2493 was, but with a change to rationals following https://github.com/mthom/scryer-prolog/pull/2505#discussion_r1733427910 (basically now they are serialized as {"rational": {"numerator": ..., "denominator": ...}}), and also without migrating the integration tests. I believe this will still change a bunch so I defer migrating the integration tests (which depend on serialization) to after this gets merged. This also doesn't have the deserialization from #2505.

@jjtolton, if you really want to continue the JSON API route in #2465 you should rebase onto this. I feel maybe just doing C APIs for dealing with Term (and with that wrapping libraries can do JSON themselves in whatever way they want) will be a faster path to merging than waiting to reach consensus on the serialization format here.

Most relevant points in the discussion until now, to recap:

  • Serialization formats list: https://github.com/mthom/scryer-prolog/pull/2493#issuecomment-2310693900
  • It seems we mostly agreed to using JSON types where it makes sense, and internal tagging for everything else: https://github.com/mthom/scryer-prolog/pull/2493#issuecomment-2310736951. This is not how it's implemented now in this PR, but this would be quite easy to migrate to.
  • Because JSON is kinda weird with numbers (often implementations just use f64 for all numbers), maybe it makes sense to always encode numbers as strings, tough it seems kinda unfortunate to force users to parse numbers even for small integers: https://github.com/mthom/scryer-prolog/pull/2493#issuecomment-2311790999 https://github.com/mthom/scryer-prolog/pull/2493#issuecomment-2313115358

Also, tip when writing JSON on Github: use json5 in codeblocks. It allows things like comments without freaking out. Compare:

// json
{ "a": 1 }
// json5
{ "a": 1 }

bakaq avatar Dec 14 '24 07:12 bakaq

My 2c having hacked out a similar thing for trealla-js[^1][^2]: I ended up with pretty much the same structure as https://github.com/mthom/scryer-prolog/pull/2493#issuecomment-2310736951, which I'm pretty happy with. For integers, I serialize directly to number within the JS safe integer range (±9007199254740991), otherwise as object-wrapped strings which become bigint on the JS side. I did a similar thing with trealla-go and int64 vs *big.Int. In general it's kind of annoying to work with bigints in those languages, so making the more common small integers a regular number type is desirable.

[^1]: encoding: https://github.com/guregu/trealla/blob/main/library/wasm.pl [^2]: decoding: https://github.com/guregu/trealla-js/blob/9c9cc972c795895fde93bf02783cb73935e3bcdc/src/term.ts#L220

guregu avatar Feb 14 '25 10:02 guregu

trealla fam! ❤️ I didn't even know trealla-js existed, so cool! Love the TS bindings. Great work and thanks for the information! 👀

jjtolton avatar Feb 14 '25 10:02 jjtolton

Thanks @jjtolton! One of my goals is to eventually provide a Typescript library where the backend implementation can be swapped between Trealla/Scryer, and I'd be happy to change my JSON internals to match what Scryer decides to make things easier to do that in the future. Just wanted to chime in that the path Scryer is taking looks good to me.

guregu avatar Feb 14 '25 11:02 guregu

That's great! I'm working on a similar thing with libscryer-clj. @bakaq is doing some phenomenal work with WASM right now, he even "beat me to the punch"! After that I'd like to add ClojureScript support for libscryer-clj.

But ultimately I'd like to support Trealla as well!

I'm working on embedding Prolog into video games, and if I ever make a console port, I think I will need to use Trealla!

I hope we get to chat more about it.

jjtolton avatar Feb 14 '25 11:02 jjtolton

[...] otherwise as object-wrapped strings which become bigint on the JS side.

In #2825 I just made everything into BigInt, but that is just a temporary shortcut. The problem with this on this issue here is that there is no big int type in JSON (and other serialization encodings). In languages where there are big ints it seems to be trivial to convert between a string and them, so I still think big integers (and therefore also rationals) should be serialized as tagged strings which can then be converted into a big int by the consumer.

{ "type": "integer", "value": 100 }
{ "type": "integer", "value": "100000000000000000000000000000" }

bakaq avatar Feb 14 '25 12:02 bakaq

[...] otherwise as object-wrapped strings which become bigint on the JS side.

In #2825 I just made everything into BigInt, but that is just a temporary shortcut. The problem with this on this issue here is that there is no big int type in JSON (and other serialization encodings). In languages where there are big ints it seems to be trivial to convert between a string and them, so I still think big integers (and therefore also rationals) should be serialized as tagged strings which can then be converted into a big int by the consumer.

{ "type": "integer", "value": 100 }
{ "type": "integer", "value": "100000000000000000000000000000" }

I agree. I’ve seen such an encoding scheme used in JSON-based document databases (DynamoDB) for the same reason. I didn’t wrap small numbers or floats in an object, which makes parsing in JS slightly easier, but makes life harder for languages with stricter type systems. When in doubt wrapping seems like the best solution; it’s easy for client libraries to convert types especially when they are explicitly tagged. Looks good to me!

guregu avatar Feb 14 '25 12:02 guregu

Well, serializing small ints as just 100 instead of { "type": "integer", "value": 100 } seems worth the trouble actually.

bakaq avatar Feb 14 '25 12:02 bakaq

Well, serializing small ints as just 100 instead of { "type": "integer", "value": 100 } seems worth the trouble actually.

I think it’s nice, the only potential issue is some ambiguity with floats if they also get serialized as non-wrapped numbers. JS treats them the same of course so it’s no problem there. Tagging them can provide a hint to typed languages to avoid having to do heuristics (“does it have a dot?”). This is basically ambiguity built into JSON so I don’t think there’s a right or wrong answer.

guregu avatar Feb 14 '25 13:02 guregu