protostream icon indicating copy to clipboard operation
protostream copied to clipboard

[#405] Convert object to/from JSON

Open jabolina opened this issue 11 months ago • 7 comments

The field IDs for Instant and Date were ignored during the conversion so the generated JSON was an empty string. The fix was a bit more convoluted. I'll explain all I have and maybe there's a better solution.

  • An Instant is converted by the WrappedMessage to byte[] by writing one int64 field for seconds and int32 for nanos.
  • The Proto to JSON conversion needs to read two fields now. The message-wrapping.proto file describes the seconds part as a oneOf which writes the key as "_value". We still need to write to fields, so we end with {"_value": 0, "wrappedInstantNanos": 0} without any typing.
  • We could use the type as a WrappedMessage, but the conversion from JSON to Instant would fail as it creates the Instant inside the WrappedMessage object. So we need an explicit type for the conversion to create the correct object.

Since we need this conversion from JSON to Instant to work, I've created an adapter in the types module. During the Proto to JSON conversion, we require this adapter to be registered so we can write the field names and the type correctly.

We are still left with converting JSON to Proto byte[]. We need to verify in the WrappedMessage class whether this object corresponds to a known type. Otherwise, the generated byte[] will be entirely different from the one produced by Instant to Proto byte[] serialization.

The Date object suffers from the same thing. The difference is that:

  • We serialize Date as an int64 from the Date#getTime method.
  • This would create a JSON as {"_type": "int64", "_value": 12345}. Which loses any type parameter that this value is a Date object.
  • The conversion from JSON to Date will fail as it returns a primitive long number instead of the Date object.

Closes #405.

jabolina avatar Jan 27 '25 19:01 jabolina

Directly wrapping objects with the WrappedMessage leads to:

// new WrappedMessage(new WrappedMessage(UUID.randomUUID()))
{
   "_type":"org.infinispan.protostream.WrappedMessage",
   "_value":{
      "_type":"org.infinispan.protostream.WrappedMessage",
      "_value":{
         "_type":"org.infinispan.protostream.commons.UUID",
         "mostSigBitsFixed":15461222357418722979,
         "leastSigBitsFixed":13070220576097665502
      }
   }
}

Using with collections:

// List.of(new WrappedMessage(UUID.randomUUID()), new WrappedMessage(UUID.randomUUID()))
{
   "_type":"collections.ListOfAdapter",
   "elements":[
      {
         "_type":"org.infinispan.protostream.WrappedMessage",
         "_value":{
            "_type":"org.infinispan.protostream.commons.UUID",
            "mostSigBitsFixed":1624897274251135499,
            "leastSigBitsFixed":11937877021784957987
         }
      },
      {
         "_type":"org.infinispan.protostream.WrappedMessage",
         "_value":{
            "_type":"org.infinispan.protostream.commons.UUID",
            "mostSigBitsFixed":12908692626181016473,
            "leastSigBitsFixed":9445129632214002810
         }
      }
   ]
}

I've also removed the use of _value by default in fields defined in the message-wrapping.proto:

// new WrappedMessage(1.23f)
{
   "_type":"org.infinispan.protostream.WrappedMessage",
   "wrappedFloat":1.23
}

jabolina avatar Feb 14 '25 21:02 jabolina

I'll revert this to a draft. There are some corner cases with more nested objects, collections, and adapters that don't create exact same byte array for the conversion of JSON -> Proto.

And by looking at the test failures, the _value field is expected in several places :/

jabolina avatar Feb 14 '25 21:02 jabolina

Some containers were completely ignored by the JSON serialization (ArrayList, HashSet). I'll work to add those as well.

jabolina avatar Mar 18 '25 22:03 jabolina

Some time later, more than I would like. I've refactored the old code. Instead of doing (parse, write, prettify, etc.) spread all around, I've created specialized classes for each type we would expect in the byte stream: primitives, nested objects, repeated fields, maps, and containers.

This uses the specialized classes with internal delegates, so each class can focus on doing its part and passing the request along. For example, a list of UUIDs: there is the root parser, which passes the request to an object writer for the adapter, which delegates to an array writer, which delegates each UUID to an object writer.

The writers will pass the context along so we can track where we are in the JSON. Leading to the other change. Instead of passing a StringBuilder and writing, we tokenize the byte stream by pushing tokens from the JSON specification into a list. After reading the stream, we write the tokens in order and create the final JSON.

The parsing of JSON to byte array remains more or less the same as before, plus fixes to handle more types. This needed changes to go over again from the start when parsing a document, and we need to pass along which field number it belongs. The downside here is that we go to the bottom of the JSON and build up writing in a nested buffer. This has some extra allocations in some places when we need to wrap a byte array with WRAPPED_MESSAGE and the field.

The changes here are pretty much backwards compatible with what we had before. A JSON written in a previous version should be decoded just fine. Observe that we only changed the expected JSON in a single test: a test that uses an ArrayList container. Everything else is the same.

jabolina avatar Apr 01 '25 20:04 jabolina

@jabolina would this allow to push json from the client and convert to / from protostream ? in hotrod I mean

karesti avatar Apr 07 '25 15:04 karesti

@karesti, correct. A sample application doing that with the Hot Rod client:

public class EncodingRemoteCache {

   private static final String CACHE_NAME = "test-cache";

   private static void registerMagazineSchemaInTheServer(RemoteCacheManager cacheManager, GeneratedSchema schema) throws IOException {
      // Retrieve metadata cache
      RemoteCache<String, String> metadataCache =
            cacheManager.getCache(PROTOBUF_METADATA_CACHE_NAME);

      // Define the new schema on the server too
      metadataCache.put(schema.getProtoFileName(), schema.getProtoFile());
   }

   public static ConfigurationBuilder connectionConfig() {
      ConfigurationBuilder builder = new ConfigurationBuilder();
      builder.addServer()
            .host("127.0.0.1").port(11222)
            .security()
            .authentication()
            .username("admin")
            .password("password");

      String configuration = """
            localCache:
              name: "test-cache"
              statistics: "true"
              encoding:
                mediaType: "application/x-protostream"
            """;
      builder.remoteCache(CACHE_NAME).configuration(configuration);
      return builder;
   }

   public static void main(String[] args) throws IOException {
      GeneratedSchema userSchema = new UserModelSerializationContextInitializerImpl();
      ProtoStreamMarshaller marshaller = new ProtoStreamMarshaller();
      marshaller.register(userSchema);

      ConfigurationBuilder cb = connectionConfig();
      cb.marshaller(marshaller);

      RemoteCacheManager rcm = new RemoteCacheManager(cb.build());
      registerMagazineSchemaInTheServer(rcm, userSchema);
      RemoteCache<String, UserModel> userCache = rcm.getCache(CACHE_NAME);
      RemoteCache<String, String> jsonCache = rcm.getCache(CACHE_NAME)
            .withDataFormat(DataFormat.builder()
                  .valueType(MediaType.APPLICATION_JSON)
                  .valueMarshaller(new UTF8StringMarshaller())
                  .build());

      UserModel user = new UserModel("jabolina", "jose", 28, Instant.EPOCH);
      userCache.put("jose", user);
      System.out.println("GET PROTO: " + userCache.get("jose"));
      System.out.println("GET JSON: " + jsonCache.get("jose"));
      rcm.stop();
   }

   @Proto
   public record UserModel(String username, String name, int age, Instant creation) {

      @ProtoSchema(
            syntax = ProtoSyntax.PROTO3,
            dependsOn = {
                  CommonContainerTypes.class,
            },
            includeClasses = UserModel.class,
            schemaFileName = "user.proto",
            schemaFilePath = "proto/",
            schemaPackageName = "user"
      )
      public interface UserModelSerializationContextInitializer extends GeneratedSchema {}
   }
}

The output here for the system out is:

GET PROTO: UserModel[username=jabolina, name=jose, age=28, creation=1970-01-01T00:00:00Z]
GET JSON: {"_type":"user.UserModel","username":"jabolina","name":"jose","age":28,"creation":0}

I believe the marshaller here is only necessary for the put operation. The server is the one doing the work of transforming the model to JSON. In theory, the Hot Rod client doesn't need to know about the Proto descriptor.

Also, the changes in the PR already exist in previous versions of ProtoStream. The current PR is only refactoring that part.

jabolina avatar Apr 07 '25 22:04 jabolina

@jabolina I believed it was something else, like really posting json JSONObject for example, and getting it converted!

karesti avatar Apr 08 '25 08:04 karesti

Thanks @jabolina

ryanemerson avatar Apr 14 '25 16:04 ryanemerson