go icon indicating copy to clipboard operation
go copied to clipboard

proposal: encoding: add ScalarMarshaler and ScalarUnmarshaler

Open dsnet opened this issue 3 years ago • 11 comments

There are often types that are fundamentally opaque wrappers around a scalar type. One such example is atomic.Bool and friends. In order to get these types to operate well with serialization libraries, they would need to implement one of the Marshaler or Unmarshaler interfaces (see #54582).

The current interfaces in the "encoding" package fall short:

  • encoding.TextMarshaler and encoding.TextUnmarshaler only treat the type as if it were a string, and so can't natively represent a boolean or number.
  • json.Marshaler and json.Unmarshaler is specific to JSON, which means that only "encoding/json" benefits and not other encoding libraries (in stdlib such as "encoding/xml" or third-party packages).
  • The MarshalText or MarshalJSON always allocates, which is rather heavy-weight for something that is encoding a scalar.

To work around these detriments, I propose the addition of the following interface into the "encoding" package:

type ScalarMarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    MarshalScalar() (T, error)
}
type ScalarUnmarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    UnmarshalScalar(T) error
}

The type list specifies exactly 5 types, which matches the superset kind of all the scalar kinds in Go:

  • int8, int16, int32, int, etc. are not included since they can be represented as int64. When unmarshaling, the implementation can return an error if the provided int64 is out of range.
  • We do not declare the constraint as using the underlying kind (i.e., not ~bool | ~int64 | ~uint64 | ~float64 | ~complex128) to keep the possible set of concrete interfaces a finite set that encoding packages can actually check for.
  • string is not included because 1) a string is technically not a "scalar" type in that it can represent an infinite number of values, while scalar pertains to a finite set of possible values (i.e., following some scale), and 2) because we already provide TextMarshaler and TextUnmarshaler, which are more or less the same thing.

Many serialization formats have first-class support booleans and various numeric types, so this interface provides a way to express that fact.

It is a feature (not a bug) that the 5 possible representations of this interface all use the same method name. That is, for a given type, it can only choose to implement scalar representation for one of the scalar kinds. We do not need to worry what happens when a type implements both MarshalScalar() (int64, error) and MarshalScalar() (float64, error) at the same time.

If MarshalJSON and MarshalText and MarshalScalar are all present, it is the discretion of the encoding package to choose the precedence order. I recommend MarshalJSON taking precedence over MarshalText, taking precedence over MarshalScalar.

It is up to an encoding package to decide which of the scalar kinds it wants to support. For example, "encoding/json" would support scalar marshaling for bool, int64, uint64, and float64, but reject complex128 since JSON has no representation of complex numbers. This matches the existing "encoding/json" behavior where it can't serialize complex128.

\cc @johanbrandhorst @mvdan

dsnet avatar Oct 14 '22 17:10 dsnet

A search through all public modules known by the module proxy showed no implementations of the MarshalScalar and UnmarshalScalar methods, so there should be no backwards compatibility concerns.

dsnet avatar Oct 14 '22 20:10 dsnet

Overall the proposal sounds like a good idea to me. The only surprising aspect is that MarshalText would usually take precedence over MarshalScalar. If I have a type which could marshal itself as either an integer or a string, I don't think I could implement both methods correctly, as one would always win.

I guess I could instead implement MarshalJSON, but that is specific to a single encoding format.

mvdan avatar Oct 14 '22 22:10 mvdan

The only surprising aspect is that MarshalText would usually take precedence over MarshalScalar.

I think that's up to the encoding package in question. The proposal seems to just be suggesting that precedence for the encoding/json package specifically.

DeedleFake avatar Oct 14 '22 22:10 DeedleFake

Whichever way the precedence ends up between the two, I think implementing the case I mention above with MarshalText and MarshalScalar would not be possible. Maybe that limitation by design is fine - I am simply pointing it out.

mvdan avatar Oct 14 '22 22:10 mvdan

In my original proposal I argued against the inclusion of sub-64-bit integers since the value can be represented by the larger width kind. However, allowing smaller bit widths may be helpful for formats that encode smaller width integers differently.

Analysis:

  • Almost every text-based encoding in existence permits formatting numbers in decimal, where the exact bit-width is not directly expressed (i.e., you cannot distinguish between a 0 that came from an int8 versus a 0 from an int64).
  • Any binary encodings that use arbitrary precision integers (e.g., varints) are not affected.
  • Any binary encodings that use fixed-width representation are affected.

Thus, there is some utility (for formats like BSON) in supporting the smaller-width numeric kinds (e.g., int8, int16, int32, etc.). However, it is unclear whether the benefit is worth the complexity. Adding various bit widths will expand the set of scalar types by 8 (i.e., int8, int16, int32, uint8, uint16, uint32, float32, complex64). This will greatly increase the complexity of encoding libraries that need to check for these.

If we don't expose the width here, that information can always be exposed in a side-channel mechanism. For example:

type MyStruct struct {
    NumProcessed atomic.Int32 `bson:",int32"`
}

I'm still in favor of excluding the smaller width types, but wanted to call this caveat out.

dsnet avatar Oct 20 '22 21:10 dsnet

I thought more about precedence order between MarshalBinary, MarshalText, and MarshalScalar. I think we should deliberately stay silent on the issue and leave it to each encoding package to decide what is best.

Consider the following example:

// Color is a finite enumeration of supported colors.
type Color int

const (
	Red   Color = 1
	Green Color = 2
	Blue  Color = 3
)

func (c Color) MarshalText() ([]byte, error)
func (c Color) MarshalScalar() (int64, error)
func (c *Color) UnmarshalText(b []byte) error
func (c *Color) UnmarshalScalar(v int64) error

For most text-based serializations (e.g., JSON), you probably want to use MarshalText over MarshalScalar. For most binary-based serializations (e.g., CBOR), you probably want to use MarshalScalar over MarshalText.

When unmarshaling, you could imagine JSON using UnmarshalText if the input is a JSON string, and UnmarshalScalar if the input is a JSON number. This is precisely how the JSON specification for protocol buffers works, where enumerations are marshaled in its textual form, but permit unmarshaling from either the textual form or numeric forms.

dsnet avatar Oct 20 '22 21:10 dsnet

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc avatar Oct 26 '22 18:10 rsc

I'm still in favor of excluding the smaller width types, but wanted to call this caveat out.

I wonder, can we decide to add sub-64-bit integer types later? It's not clear to me whether that would be considered a breaking change given the type parameter.

I think we should deliberately stay silent on the issue and leave it to each encoding package to decide what is best.

That sounds like the right choice to me. As an implementer, I want to offer the encoding methods which makes sense for my type. I don't want to dictate which one should always be used over another. An encoding package can inspect which methods I implement and make a reasonable decision about which one to use, like yous ay.

mvdan avatar Oct 26 '22 20:10 mvdan

I wonder, can we decide to add sub-64-bit integer types later? It's not clear to me whether that would be considered a breaking change given the type parameter.

I'm not sure, but I think so?

\cc @ianlancetaylor

dsnet avatar Oct 27 '22 18:10 dsnet

The only fashion by which constraints containing union elements are contravariant with their constituents is in the operation set. I.e., if adding a type to a union element creates a smaller operation set, then programs using that constraint can break. There are two reasons why this doesn't apply here:

  • The constraint isn't given an exported name, so no other code can actually vary with the union. Programs which want to be generic over all types implementing the proposed interfaces need to list the types again themselves, but adding to our list doesn't affect theirs.
  • Because bool is included, the operation set is already close to minimal. The only operation supported by both bool and any numeric type is comparison, which is supported by every numeric type. So, adding another numeric type can't change the operation set.

If we remove both of these properties, it is possible to write programs which fail to compile when adding another type to the union: https://go.dev/play/p/fbi6-RzzspL, https://go.dev/play/p/CGnAxdndZSA

(It is weird to think of untyped literals as "operations." I twisted myself up a couple times writing this.)

zephyrtronium avatar Oct 27 '22 19:10 zephyrtronium

I agree that in a case like this adding additional terms to the type union in the constraint should be fine.

ianlancetaylor avatar Oct 27 '22 20:10 ianlancetaylor

It sounds like people are generally happy with

package encoding

type ScalarMarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    MarshalScalar() (T, error)
}
type ScalarUnmarshaler[T bool | int64 | uint64 | float64 | complex128] interface {
    UnmarshalScalar(T) error
}

What do we do after that? Specifically, it seems like we can either add support for them to marshalers (like encoding/gob), or we can add implementations to existing types in the standard library, but we can't do both, because that would likely change the wire format generated by some of these packages. So which do we do?

It sounds like above all we want to add them to the marshalers (like gob and json (and xml?)) and then after that, we can add these methods to types that gob and json (and xml?) currently cannot marshal at all and therefore have no existing wire format. This would include atomic.Uint32 and so on.

Do I have that right?

Also, is encoding/json really going to do 10 extra dynamic interface satisfaction checks on every value it marshals and unmarshals? Will we notice the expense?

rsc avatar Nov 02 '22 17:11 rsc

@rsc Technically speaking adding them to the marshalers would be a backwards incompatible change, as types (not just in the stdlib) currently could have those methods and would then change how they are encoded.

Mayhaps something like json.Decoder.UseNumber? That would be strictly backwards compatible and allow to implement the interfaces for types in the stdlib.

Or we ignore the theoretical breakages in lieu of evidence of practical ones.

Merovius avatar Nov 02 '22 18:11 Merovius

Also, is encoding/json really going to do 10 extra dynamic interface satisfaction checks on every value it marshals and unmarshals? Will we notice the expense?

I think encoding/json can do one check if a method of the name exists and only do the interface-satisfaction check in that case.

Merovius avatar Nov 02 '22 18:11 Merovius

@Merovius, per https://github.com/golang/go/issues/56235#issuecomment-1279423011, there were no detected methods of the proposed name according to all code known by the module proxy.

I think encoding/json can do one check if a method of the name exists and only do the interface-satisfaction check in that case.

That's my plan for implementing this. The marshal path for "json" already has a sync.Map that keys from reflect.Type to encoder functions. There's an open issue to do the equivalent for unmarshal (#16212).

dsnet avatar Nov 02 '22 18:11 dsnet

@dsnet Fair enough (Next time I chide someone for not reading the entire discussion before commenting, feel free to remind me of this 🤦)

Merovius avatar Nov 02 '22 18:11 Merovius

No worries at all. The throughput of comments on golang/go issues and discussions is astonishing and unreasonable to expect every person to be at the same level of state.

dsnet avatar Nov 02 '22 18:11 dsnet

Moving this to likely accept, but it seems like we should wait for Go 1.21 to make sure we land the encoder updates and the new implementations of these methods together.

rsc avatar Nov 09 '22 18:11 rsc

we should wait for Go 1.21 to make sure we land the encoder updates and the new implementations of these methods together.

Agreed.

To be clear, acceptance of this proposal implies concurrent support within relevant "encoding/..." packages?

This would include the following packages:

  • "encoding/json"
  • "encoding/xml"
  • "encoding/gob"

It is unclear whether to add support in "encoding/asn1" since that package currently does not support encoding.BinaryMarshaler or encoding.TextMarshaler even though there are clear use-cases for it. It seems odd to partial support for some encoding interfaces rather than others.

dsnet avatar Nov 09 '22 19:11 dsnet

Given the existing lack of support for encoding interface in "asn1", I propose we leave the package unchanged. The package does have MarshalWithParams and UnmarshalWithParams functions where we can add a special parameter that instructs the package to respect encoding interfaces. That can be a separate proposal in the future.

dsnet avatar Nov 09 '22 19:11 dsnet

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc avatar Nov 09 '22 19:11 rsc

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc avatar Nov 16 '22 18:11 rsc

Change https://go.dev/cl/553176 mentions this issue: encoding/json: handle encoding.ScalarMarshaler and encoding.ScalarUnmarshaler

gopherbot avatar Dec 29 '23 22:12 gopherbot

Change https://go.dev/cl/553175 mentions this issue: encoding: add ScalarMarshaler and ScalarUnmarshaler

gopherbot avatar Dec 29 '23 22:12 gopherbot

I find this proposal valuable and didn't see any open CLs mentioned here, so I sent in CL 553175 defining the interfaces and CL 553176 implementing their use in encoding/json. I can do the same for encoding/xml and encoding/gob soon.

zephyrtronium avatar Dec 29 '23 22:12 zephyrtronium

Change https://go.dev/cl/568896 mentions this issue: encoding/xml: handle encoding.ScalarMarshaler and encoding.ScalarUnmarshaler

gopherbot avatar Mar 04 '24 22:03 gopherbot

Change https://go.dev/cl/569276 mentions this issue: encoding/gob: handle encoding.ScalarMarshaler and encoding.ScalarUnmarshaler

gopherbot avatar Mar 05 '24 21:03 gopherbot