infra icon indicating copy to clipboard operation
infra copied to clipboard

Define conversions across types

Open andreubotella opened this issue 5 years ago • 6 comments

As per https://github.com/whatwg/encoding/pull/215#discussion_r457324821, we might want to enable Infra types to define conversions from and to other types.

andreubotella avatar Jul 20 '20 17:07 andreubotella

So what do we need here?

  • Byte sequence as list.
  • String as list. Code unit and code point, presumably? Encoding needs code point, but I suspect in other places we would want to do code units, if anything.

And the reverse?

Also, should we make it implicit so you can write <a for=list>For each</a> <var>byte</var> of <var>bytes</var> or do we want <var>bytes</var> to be explicitly converted to a list first?

Or even further, do we want to say that byte sequences and strings are fundamentally lists? (I guess that doesn't work for strings do to code unit/code point stuff.)

annevk avatar Jul 22 '20 11:07 annevk

I'm looking at the various usages of the Encoding hooks across several standards, and they seem to be called almost every time with a byte sequence (respectively, with a string), with the return value being used as a string (resp. byte sequence). Note that this already relied on implicit conversions before whatwg/encoding#215.

I suppose it might be fine to make a conversion implicit if it's on an algorithm boundary with well-defined types. For example, if "decode" is called with a byte sequence, it's clear that it has to be converted into an I/O queue of bytes. Likewise, if inside the steps for "decode", a string was returned, it'd be clear that it'd have to be converted into an I/O queue of scalar values inside the decode operation. But from outside the decode algorithm, the output type of the conversion is not necessarily clear, and since the range of possible types might be open-ended, the conversion would have to be explicit:

Let string be the result of UTF-8 decoding byteSeq, converted to a string.

andreubotella avatar Sep 14 '20 09:09 andreubotella

That, or we define an I/O queue of scalar values that contains end-of-queue as being interchangeable with a scalar value string. That might also address the for each problem although I guess you'd not want end-of-queue to show up there... Or we define a string-returning version of the frequently invoked decoding algorithms.

annevk avatar Sep 14 '20 09:09 annevk

I think we might want to define that types which are a wrapper over some other type should by default have conversions to/from that wrapped type, but we might want to define additional conversions and/or override the default ones.

For example, let's say that string was defined as a list of code units (which it probably should). Then there'd be a conversion string → list of code units and a conversion list of code units → string by default. But we could additionally define a conversion string ↔ list of code points, and we could in turn use that conversion to define code point length, scalar value string, collect a sequence of code points...

Now, for some types which add additional semantics to their wrapped types, such as set, we could define an explicit algorithmic conversion list → set which maintains the invariants. And we could use that same thing to handle end-of-queue on I/O queues.

andreubotella avatar Sep 15 '20 09:09 andreubotella

It's pretty weird that you cannot (or can no longer?) apply UTF-8 decode to a byte sequence, but instead have to apply UTF-8 decode to the result of converting the byte sequence into an I/O queue.

domenic avatar Apr 15 '21 17:04 domenic

Not sure if this came up in the context of writing new specification text, but I think we should continue to write text as if that is possible and eventually fix the plumbing.

annevk avatar Apr 16 '21 05:04 annevk