flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-35780] Support state migration between disabling and enabling State TTL for RocksDBState

Open xiangyuf opened this issue 1 year ago • 5 comments

[FLINK-35780][state] Support state migration between disabling and enabling State TTL in RocksDBState

What is the purpose of the change

Support state migration between disabling and enabling state for RocksDBState.

Brief change log

  • Introduce TtlAwareSerializer and TtlAwareSerializerSnapshot
  • Introduce COMPATIBLE_AFTER_TTL_MIGRATION in TypeSerializerSchemaCompatibility
  • Wrap all stateSerializer when creating states with TtlAwareSerializer
  • Wrap all recovered TypeSerializerSnapshot with TtlAwareSerializerSnapshot
  • Resolve TTL migration compatibility check in TtlAwareSerializerSnapshot
  • Support TTL state migration in TtlAwareSerializer
  • Re-create RocksDB Column Family when migrate state values between disabling and enabling State TTL
  • Change migrateSerializedValue in AbstractRocksDBState, RocksDBListState and RocksDBMapState
  • Support update ColumnFamilyHandle in RocksDBKeyedStateBackend.StateUpdateFactory

Verifying this change

This change added tests and can be verified as follows:

  • Added Unit Test in EmbeddedRocksDBStateBackendMigrationTest .testStateMigrationAfterChangingTTLFromDisablingToEnabling
  • Added Unit Test in EmbeddedRocksDBStateBackendMigrationTest.testStateMigrationAfterChangingTTLFromEnablingToDisabling
  • Added Unit Test in RocksDBTtlStateTestBase.testRestoreTtlAndRegisterNonTtlStateCompatFailure
  • Manually verified the change by disabling state TTL at first and then enable it and then disable it. And check the checkpoint size. image

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: yes
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes

xiangyuf avatar Jul 07 '24 16:07 xiangyuf

CI report:

  • 14340e0654005cf00452740dc929b3deb3f4bd1b Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jul 07 '24 17:07 flinkbot

@flinkbot run azure

xiangyuf avatar Sep 12 '24 06:09 xiangyuf

Thanks for the PR! One question: why not overriding the TtlSerializerSnapshot#resolveSchemaCompatibility instead of introducing many isMigrateFromDisablingToEnabling conditional branching everywhere?

@Zakelly Thx for reply. Even if I override TtlSerializerSnapshot#resolveSchemaCompatibility here, this method can only resolve schema compatibility with TypeSerializerSnapshot<TtlValue<T>> instead of TypeSerializerSnapshot<T>. So this is not helpful to resolve ttl migration compatibility.

xiangyuf avatar Sep 21 '24 14:09 xiangyuf

@Zakelly Currently, the serializer's compatibility check assumes the new serializer and previous serializer should have the same datatype. This does not apply to compatibility check between TtlValue<V> and Value. This might be another issue that can be optimized

xiangyuf avatar Sep 22 '24 12:09 xiangyuf

This PR may involve lots of changes on current codebase. We can reach an agreement in the general direction at first.

xiangyuf avatar Sep 22 '24 14:09 xiangyuf

@flinkbot run azure

xiangyuf avatar Nov 04 '24 16:11 xiangyuf

Merged f2568dee63138899cb80982a9659ab25f0d38c2c into master

Zakelly avatar Feb 10 '25 10:02 Zakelly