ts-proto icon indicating copy to clipboard operation
ts-proto copied to clipboard

google.protobuf.Any to JSON special conversions

Open mvukov opened this issue 2 years ago • 10 comments

According to the official docs for Any the generated JSON should have this extra field @type and the value directly embedded. What I see in the generated code from ts-proto is something like:

export const Any = {
  // ...

  fromJSON(object: any): Any {
    return {
      typeUrl: isSet(object.typeUrl) ? String(object.typeUrl) : "",
      value: isSet(object.value)
        ? bytesFromBase64(object.value)
        : new Uint8Array(),
    };
  },

  toJSON(message: Any): unknown {
    const obj: any = {};
    message.typeUrl !== undefined && (obj.typeUrl = message.typeUrl);
    message.value !== undefined &&
      (obj.value = base64FromBytes(
        message.value !== undefined ? message.value : new Uint8Array()
      ));
    return obj;
  },

  // ...
};

It would be nice if the Any type could be supported in ts-proto like in C++ with e.g. ParsedFrom/UnpackTo methods.

BTW, I am just starting with ts. So, if there are ways to circumvent this, please point me in the right direction. Thanks in advance.

mvukov avatar Mar 04 '22 16:03 mvukov

Hey @mvukov yeah, you're right we don't have any special handling of Any for JSON.

I think we've got a little bit of the infra in place via the outputTypeRegistry output, which adds both message.$type as well as a map of "type name --> message object", both of which I assume would be necessary to parse incoming Any-containing JSON back into the right messages...

stephenh avatar Mar 06 '22 17:03 stephenh

OK, will take a look at that as well. Thanks for helping out. Anyways, it would be nice to have this working out of the box eventually :).

mvukov avatar Mar 07 '22 08:03 mvukov

If I understand correctly, this is what we want to support?

message MyMessage {
   google.protobuf.Any payload = 1;
}
const jsonData = {
   "payload": {
      "@type": "http://.....",
      "foo": "bar"
   }
};
const myMessage = MyMessage.fromJSON(jsonData);
MyMessage.toJSON(myMessage) // equals jsonData

But, what's not clear to me, what should myMessage.payload contain?

  1. Uint8Array ?
    • Not sure how we could convert JSON to bytes, as the logic for doing that is compile-time generated, and we didn't compile that type yet. Unless, we can use the type-registry for that, and only allow pre-compiled types? I'm not familiar with this area of Protobuf.
  2. Json Data? {"@type": "...", "foo": "bar"}
    • This would be easy to implement for JSON ⇄ JSON scenarios, but then MyMessage.encode would again need to know how to convert the json to bytes, same problem as 1. Unless we decide not to support this (for now).

Are there any other requirements?

boukeversteegh avatar Mar 07 '22 10:03 boukeversteegh

@boukeversteegh ah okay, so I was looking at the protobuf Any examples in Java:

 *     Foo foo = ...;
 *     Any any = Any.pack(foo);
 *     ...
 *     if (any.is(Foo.class)) {
 *       foo = any.unpack(Foo.class);
 *     }

And I think I get it; originally I was assuming that, when using Any, a FooMessage that has a payload: Any would immediately "know" what that payload was after deserialization, like in an OO way you could do:

const foo = FooMessage.decode(bytes)
if (foo.payload instanceof BarMessage) {
  ...
}

Which, right would necessitate FooMessage.decode knowing how to dynamically access BarMessage.decode, and hence for us only work with the type registry in the output.

That said, looking at the Java example, it seems that Any in the protobuf ecosystem isn't actually that sophisticated, b/c the user needs to "bring your own polymorphism" (BYOP :-)) with these hand-coded any.is checks; whichlike for us I think it'd look like:

  1. FooMessage.payload actually is/stays an Any that is { typeUrl: string, value: Uint8Array } (basically as we would generate Any today

    • I.e. After doing FooMessage.decode(bytes), the foo.payload.typeUrl is a string and foo.payload.value is a Uint8Array
  2. The user would do BYOP:

if (Any.is(foo.payload, BarMessage)) {
  const bar = Any.unpack(foo.payload, BarMessage)
  // really ^ is the same as:
  const bar = BarMessage.decode(foo.payload);
  // ...so maybe we don't need an Any.unpack
}

(Note that the Java example uses instance methods on any, like any.unpack / any.pack; I'm using static methods on Any b/c ts-proto's messages today are just data / don't have any methods... although, we could treat Any as a value/wrapper type and actually turn it into an instance with methods on it, similar to a Date or what not...)

  1. We would output a typeUrl in each message's const (without requiring a full-blown type registry map):
export const BarMessage = {
  typeUrl = "http://whatever this is";
}

Such that Any.is(any: Any, messageType: { typeUrl: string }) would essentially return any.typeUrl === messageType.typeUrl

Given we do ^, I think that would make bytes-based FooMessage.encode / FooMessage.decode work okay w/o a type registry, and just the addition of a typeUrl const in the output.

...that said, what I don't understand yet is how JSON based deserialization would work, as you'd already mentioned; i.e. using the same "the user hand-codes their polymorphism" / BYOP:

const jsonData = `
  { payload: { typeUrl: "...", firstName: "bob" } };
`;
const foo = FooMessage.fromJSON(jsonData);
// this we can do b/c payload.typeUrl exists...
if (Any.is(foo.payload, BarMessage)) {
  // this we cannot do, b/c value isn't a byte[], it's a bunch of key/value pairs...
  const bar = BarMessage.decode(foo.payload.value);
}

I wonder how the Java/C++ bindings solve this, i.e. where do they put these json key/value pairs between the time the FooMessage.fromJSON is called, and the BarMessage.fromMessage is called?

I suppose we could have fromJSON create an Any that was a JSON-special version, in that the payload.typeUrl is still the string, but payload.value is not really a byte[], it's the actual JSON object literal, which could only be used if you did BarMessage.fromJSON(foo.payload).

(Which would kind of make sense, just like BYOP means you have to hand-code "if .is(BarMessage)", you also have to hand-code "I 'just know' this came from JSON so use fromJSON".

Such that if we had an Any interface, it would look like:

interface Any {
  typeUrl: string;
  value: Uint8Array | object;
}

I.e. value is an Uint8Array if you're Any came from a FooMessage.decode / bytes world, but it'd be an object if it came from a FooMessage.fromJSON / json world, and you'd have to "just know" to use either BarMessage.decode or BarMessage.fromJSON as appropriate.

Granted, it does make hopping between formats kind of odd / impossible, i.e. if you do:

const foo = FooMessage.fromJSON(jsonData);
// how does it know the right thing to do?
const bytes = FooMessage.encode(foo);

// you'd probably first have to do
const foo = FooMessage.fromJSON(jsonData);
// figure out why foo.payload is and then...
foo.payload = BarMessage.fromJSON(foo.payload);
// now we can encode
const bytes = FooMessage.encode(foo);

Given ts-proto's goal is "idiomatic JS/TS", I wonder if maybe for Any support we should just assume a type registry so that we can do the "OO" approach of FooMessage.fromJSON(jsonData) immediately knows how to find BarMessage.fromJSON and so foo.payload is literally a BarMessage.

I think that is what I'd personally want, to have the most pleasant ergonomics when working with Any data...

It does assume that, at compile time, we must know all types that may go through payload, i.e. we would not be able to support like a "router" scenario where a pre-built daemon accepts messages with unknown-at-time-of-build Anys and is still able to serde them, i.e. while just doing "dumb proxying" of the messages...

I suppose we could combine both approaches, and foo.payload would be one of three values:

  1. If you used a type registry and typeUrl was in it, foo.payload would immediately be the BarMessage type. You can use FooMessage.encode(foo) and FooMessage.toJSON(foo) w/o any issues.

  2. If typeUrl was not in the type registry (or type registry was disabled), and you used FooMessage.decode, then payload would be a Uint8Array, and you'd have to manually convert it to the right type (BYOP). Trying to re-serialize foo as JSON w/o doing that would fail at runtime (but you could re-serialize as bytes and we'd drop it on the wire as-is).

  3. If typeUrl was not in the type registry (or type registry was disabled), and you used FooMessage.fromJSON, then payload would be an object, and you'd have to manually convert it to the right type (BYOP). Trying to re-serialize foo as bytes w/o doing that would fail at runtime (but you could re-serialize as JSON and we'd drop it on the wire as-is).

I dunno...I think ^ makes sense, but it sounds like a lot of work. :-D @boukeversteegh wdyt? I think where I ended up is probably the "most fancy aka expensive" but also "most ergonomic" solution...

@mvukov let us know if we're making this more complicated than it should be :-)

stephenh avatar Mar 13 '22 18:03 stephenh

Great analysis @stephenh! The odd-looking sequence where things are only parsed partially, and filling in the blanks manually is something I wouldn't imagine users would be happy with.

Of course it's better than not being able to work with Any at all though. However, explaining it in the docs will also be difficult.

I would personally expect that FooMessage.payload to contain an Any object, with a separate instruction to decode it (for performance reasons). But I realize this is impossible, because sometimes Any is provided as Bytes (decode), and sometimes as JSON (fromJSON), so in one of those cases conversion needs to happen, in order to normalize it.

Unless, we would represent Any with two fields, and make it 'lazy' in both scenarios.

class Any {
  private _jsonEncoded: any;
  private _bytes: Uint8Array;
  private _type: string;

  toJSON() => this._jsonEncoded | // OR decode the bytes and then call <type>.toJSON()
}

I guess it's fine to make Any a bit smarter than the other messages (i.e. break the 'interface only' principle).


Is it actually possible to implement Any.is using <message>.$type? Because I think that will just store the typename, and not a URL, right? Or are those strings the equivalent?


I agree with your last point that we need feedback on what is really needed. It sounds indeed like a lot of work and especially without any experience of working with Any (for me at least), I wouldn't know how to evaluate the design choices.


PS: I have also never worked with the type registry, but if it makes things simpler (narrower scope) to require it for Any fields to work at all, I would say lets do it. We can always expand.

boukeversteegh avatar Mar 13 '22 19:03 boukeversteegh

Any solutions?

shachardevops avatar Apr 12 '22 13:04 shachardevops

As someone who uses this similar pattern in ts-proto, I've forgone "Any", and use "Value" + type registry hacks instead.

The primary reason is that Any feels only semi-supported in the binary encodings of proto3, at least according to the documentation. Value has clear semantics and encodings, and is a little less magic. https://developers.google.com/protocol-buffers/docs/proto3#any -- "Currently the runtime libraries for working with Any types are under development."

I suppose sooner or later, we'd port to Any if the support was good!

fizx avatar Apr 12 '22 23:04 fizx

I was in urgent need for any parsing from js side, so only workaround for me was - to edit generated files manually - I've replaced : Any to : any in interfaces and from/to json functions. That way it can at least work, with some tricks and even without, in some use cases. Can you please add this to well known types, with option --ts_proto_opt=any=any or any=native so generator wont emit Any, but any type - typescript has native support for this scenario.. (while thinking of another solution) Thanks!

xakepp35 avatar Apr 13 '22 07:04 xakepp35

Please consider "@type" over typeUrl to remain with the canonical definition of Any.

mauimauer avatar Sep 13 '22 23:09 mauimauer

Any update on this?

bradleybeighton avatar May 02 '23 17:05 bradleybeighton