Orb icon indicating copy to clipboard operation
Orb copied to clipboard

Move away from nul-terminated strings?

Open RoyalIcing opened this issue 2 years ago • 1 comments
trafficstars

Orb conveniently converts strings like "abc" into an integer “pointer”, and automatically adds an (data) entry initialized in memory at that pointer offset.

These strings are modelled after C’s strings. They are nul-terminated, which means you must measure their length at runtime by looping over every character until you hit \0.

This means the zero byte is not possible to encode, which is important for some protocols to include.

I think the general consensus (citations needed) is that people consider C’s nul-terminators a mistake. Better to have an explicit length stored somehow. This is faster as you don’t have to iterate over the string to know its length, and likely safer as you can’t change a string’s length by merely flipping a byte to/from 0x0. It also lets you extract many “slices” from the string by advancing the start offset and shortening the length.

Proposed options:

  1. Two i32s: memory-offset and length. Downside is that passing it to a function now requires two arguments, which is much harder for our macros to handle. You’d have to pass around a string in two parts, always remembering to keep the two variables together, and requiring some naming convention like _str and _len suffixes.
  2. Single i64, with first 32-bits as the memory-offset and the second 32-bits for length. This is much easier to pass to functions as it’s a single value. This could be represented by a type like Memory.Range. The downside is you’d need some lightweight inlined macros to extract the offset and length. And it might be a “weird” Orb-only convention. I’d prefer a solution that is elegant and obvious.

Further things to consider

  • Ideally the built-in strings are as close as possible to the string in the WebAssembly Component Model. I don’t fully understand how this is represented yet.
    • https://component-model.bytecodealliance.org/design/wit.html#primitive-typeseb
    • https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md#loading
  • I want Orb to work in today’s systems, so this very likely means only supporting what is in WebAssembly 1.1 with basic integer types.

RoyalIcing avatar Oct 22 '23 13:10 RoyalIcing

This will also mean removing the Orb.I32.String module with its functions. Which is great as I’d prefer to have the amount of included boilerplate code to a minimum.

RoyalIcing avatar Jan 29 '24 21:01 RoyalIcing

Orb.I32.String has been removed.

RoyalIcing avatar Jun 13 '24 12:06 RoyalIcing

Orb.Str has been added, which is a Custom Type for (i32 i32). The first i32 is the string’s memory address, and the second is its length in bytes.

RoyalIcing avatar Jul 12 '24 08:07 RoyalIcing