redpanda
redpanda copied to clipboard
serde: require explicit compat_version
Cover letter
This addresses an underlying cause of the bug in https://github.com/redpanda-data/redpanda/pull/7312
Previously, serde defaulted to assuming an envelope was only compatible with its own latest version, which caused bugs when the encoding of a structure was updated, and the developer forgot to add a compat_version indicating that the struct should still be able to decode the previous version.
Backport Required
- [x] not a bug fix
- [ ] issue does not exist in previous branches
- [ ] papercut/not impactful enough to backport
- [ ] v22.3.x
- [ ] v22.2.x
- [ ] v22.1.x
UX changes
None
Release notes
- none
Looks good to me
Rebased on the fixes. When they merge, this gets rebased again, then need to eyeball the locations we fixed to make sure this PR doesn't accidentally revert any fixes.
do i understand correctly that the issue here is that compat_version defaults to version? if so, it would be sufficient to change the default behavior? i'm not arguing against this patch which forces it to be explicit, just checking my understanding.
True that we could default to zero (i.e. by default struct can read any older version) -- I went with making it explicit because otherwise developers just won't have the concept of a compat version in their head until something goes wrong.
True that we could default to zero (i.e. by default struct can read any older version) -- I went with making it explicit because otherwise developers just won't have the concept of a compat version in their head until something goes wrong.
+1
this is a major oversight. where did we have a unit testing gap @felixguendling? (presumably we would have caught this in our corpus-based testing).
@graphcareful does this mean that there is a gap in testing for the rpc evolution test you wrote a while back?
This is good to go
this is a major oversight. where did we have a unit testing gap @felixguendling? (presumably we would have caught this in our corpus-based testing).
I think this was nothing we would find via unit testing. It was a bad library design decision to enable the user to default the compat_version
. This was not according to the principle "make it easy to use correctly and hard to use incorrectly". Corpus-based testing should've caught this.
bad library design decision to enable the user to default the compat_version.
perhaps, but i think it was more that the default wasn't 0 that was the problem. so kinda curious about that. in any case, seems we are back in business now :)