[FLINK-35780] Support state migration between disabling and enabling State TTL for RocksDBState
[FLINK-35780][state] Support state migration between disabling and enabling State TTL in RocksDBState
What is the purpose of the change
Support state migration between disabling and enabling state for RocksDBState.
Brief change log
- Introduce
TtlAwareSerializerandTtlAwareSerializerSnapshot - Introduce
COMPATIBLE_AFTER_TTL_MIGRATIONinTypeSerializerSchemaCompatibility - Wrap all
stateSerializerwhen creating states withTtlAwareSerializer - Wrap all recovered
TypeSerializerSnapshotwithTtlAwareSerializerSnapshot - Resolve TTL migration compatibility check in
TtlAwareSerializerSnapshot - Support TTL state migration in
TtlAwareSerializer - Re-create RocksDB Column Family when migrate state values between disabling and enabling State TTL
- Change
migrateSerializedValueinAbstractRocksDBState,RocksDBListStateandRocksDBMapState - Support update
ColumnFamilyHandleinRocksDBKeyedStateBackend.StateUpdateFactory
Verifying this change
This change added tests and can be verified as follows:
- Added Unit Test in
EmbeddedRocksDBStateBackendMigrationTest.testStateMigrationAfterChangingTTLFromDisablingToEnabling - Added Unit Test in
EmbeddedRocksDBStateBackendMigrationTest.testStateMigrationAfterChangingTTLFromEnablingToDisabling - Added Unit Test in
RocksDBTtlStateTestBase.testRestoreTtlAndRegisterNonTtlStateCompatFailure - Manually verified the change by disabling state TTL at first and then enable it and then disable it. And check the checkpoint size.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no - The serializers: yes
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? yes
CI report:
- 14340e0654005cf00452740dc929b3deb3f4bd1b Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@flinkbot run azure
Thanks for the PR! One question: why not overriding the
TtlSerializerSnapshot#resolveSchemaCompatibilityinstead of introducing manyisMigrateFromDisablingToEnablingconditional branching everywhere?
@Zakelly Thx for reply. Even if I override TtlSerializerSnapshot#resolveSchemaCompatibility here, this method can only resolve schema compatibility with TypeSerializerSnapshot<TtlValue<T>> instead of TypeSerializerSnapshot<T>. So this is not helpful to resolve ttl migration compatibility.
@Zakelly Currently, the serializer's compatibility check assumes the new serializer and previous serializer should have the same datatype. This does not apply to compatibility check between TtlValue<V> and Value. This might be another issue that can be optimized
This PR may involve lots of changes on current codebase. We can reach an agreement in the general direction at first.
@flinkbot run azure
Merged f2568dee63138899cb80982a9659ab25f0d38c2c into master