kryo
kryo copied to clipboard
VersionedRecordSerializer?
Is your feature request related to a problem? Please describe. I'm trying to serialize a Record that evolves - new components are being added to it.
Describe the solution you'd like
Would it make sense to create a VersionedRecordSerializer
akin to VersionedFieldSerializer
with the same functionality, the @Since
annotation?
Describe alternatives you've considered An obvious workaround is to use a traditional class instead of a record.
@JanecekPetr: The RecordSerializer
was a contribution from Oracle. When I accepted it, I didn't really think about aligning it with FieldSerializer
and its compatible variants, but it definitely makes sense to be able to evolve records in a similar way to classes.
I don't have to time work on this at the moment but I'd be happy to accept a PR.
Okay, I actually started looking into this.
- The eclipse project setup is outdated - JUnit 5 was introduced, but the setup doesn't know about it yet. I'm not yet sure I want to provide a fix for that too, I'll see.
- The current
RecordSerializer
is sloooooow as it does a lot of reflection all the time and no component caching. I started with a caching version (no ASM yet) and the speedup of writing is 20-25× (no read/roundtrip benchmarks yet, I'll do those next week):
This is annoying as it's largely unfixable without breaking changes as there's now onlyBenchmark Mode Cnt Score Error Units RecordSerializerBenchmark.cachingRecordSerializer avgt 5 85.466 ± 5.142 ns/op RecordSerializerBenchmark.recordSerializer avgt 5 2035.110 ± 42.174 ns/op
new RecordSerializer()
constructor and nonew RecordSerializer(Class<T> type)
constructor. What should I do? The newVersionRecordSerializer
will for sure cache things and I'd be happy to fix theRecordSerializer
, too, but how? What do you think is the best approach to the design now? - There is almost no configuration of the
RecordSerializer
, there's onlyfixedFieldTypes
. I haven't yet looked into what other configuration might be necessary / useful as I do not yet fully understand all of them. Would you mind taking a look and helping me with the choice? If there is more config that'd be useful, I'll add that too. That, at least, can easily be done retroactively.
I'm happy to open new tickets and continue each thread elsewhere if needed. There's apparently a lot incoming :).
(I'm also not a fan of serializing records alphabetically as it complicates the code unnecessarily, but that ship has sailed for sure. At least it's consistent with FieldSerializer
in that you can reorder the components and can rename then as long as the alphabetical order doesn't change. Okay I guess.)
This is annoying as it's largely unfixable without breaking changes as there's now only
new RecordSerializer()
constructor and nonew RecordSerializer(Class<T> type)
constructor.
Actually we could still internally just do an internal ClassValue
lookup and store the cached component accessors and metadata there. Hm. Inconsistent with FieldSerializer, again, but would work without changing the API.
Either way, some design work needs to be done before I actually start writing some more complex code.
@JanecekPetr: Thanks a lot for looking into this! I'll think about these issues and will get back to you asap.
@JanecekPetr: Thanks again for your valuable input!
The eclipse project setup is outdated - JUnit 5 was introduced, but the setup doesn't know about it yet. I'm not yet sure I want to provide a fix for that too, I'll see.
Right, I'm using IntelliJ and haven't checked the Eclipse config. If you can, please provide a fix.
The current RecordSerializer is sloooooow as it does a lot of reflection all the time and no component caching. I started with a caching version (no ASM yet) and the speedup of writing is 20-25× (no read/roundtrip benchmarks yet, I'll do those next week) This is annoying as it's largely unfixable without breaking changes as there's now only new RecordSerializer() constructor and no new RecordSerializer(Class<T> type) constructor. What should I do? The new VersionRecordSerializer will for sure cache things and I'd be happy to fix the RecordSerializer, too, but how? What do you think is the best approach to the design now?
We will definitely find a solution for this. Your benchmark results make a good case, even for a breaking change. What I would suggest is to add the the new constructor and cache the results there. Then we deprecate the default constructor and cache the components on first access if they haven't been cached from the new constructor. WDYT?
There is almost no configuration of the RecordSerializer, there's only fixedFieldTypes. I haven't yet looked into what other configuration might be necessary / useful as I do not yet fully understand all of them. Would you mind taking a look and helping me with the choice? If there is more config that'd be useful, I'll add that too. That, at least, can easily be done retroactively.
I'd suggest we start with the simplest configuration possible and do not add any new options for now. As you said, this can easily be done retroactively, should anybody ask for it.
Having said that, my biggest concern with the whole idea of a VersionedRecordSerializer
is that before long, people will ask for a CompatibleRecordSerializer
as well. Then we have a base class RecordSerializer
and two sub-classes that hook into serialization and modify what fields are read. There will be a large amount of duplication between these serializers and the existing variants of FieldSerializer
. Furthermore, users will have to configure both a VersionedFieldSerializer
and a VersionedRecordSerializer
if they want to evolve their data structures.
Ideally, we could abstract FieldSerializer
in such a way, that it can deal with both records and normal classes. The main difficulty with this approach is that FieldSerializer
currently instantiates an empty class and then reads one field after another and sets the field's value. For records, it would have to read all fields into an array, and then finally invoke the canonical constructor. This solution would support all current serializer options, including all annotations for customizing serializers on a per-field basis.
So to sum up: We can move ahead with improving the record serializer by adding caching and extending it to support versioning, but the ideal solution would be to let field serializer and its variant handle records. I'm just not sure it can actually be implemented.
I agree overall, the fact that FieldSerializer can't pick a canonical constructor if there is one is a shame that could be optionally alleviated (at least since Java 8 where there sometimes are parameter names accessible to reflection). But that's another story and another ticket.
For now the RecordSerializer exists and what you propose makes sense. I'll look into this and should have a POC this or next week.
I didn't forget about this, I have a preliminary almost-working POC with tests and some benchmarks, but it requires more work and a cleanup pass.
I haven't had very much free time lately outside work because of the current situation war in Europe as we're trying to help people fleeing Ukraine as well as the ones staying there. I'll get back to this ... I don't know. As soon as possible.
@JanecekPetr: Did you make any progress on this issue?
I started preliminary work on refactoring the FieldSerializer
to support records, but I don't have enough free time to finish this at the moment. Your suggested performance improvements for the existing serializer would be a great intermediate step.
Oh my, indeed I have! I'm currently away from the world, but I'll post the code with my progress during the weekend so we can finish it off.
@JanecekPetr: Ping ;)
I'd like to do another Kryo 5.x release in the coming weeks. Any chance you can create a PR with your caching improvements?
@JanecekPetr: I created a PR with caching for the RecordSerializer
here: #927
Is this the same approach you took? I'm also seeing a 20x speed-up.