redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Require serde_fields for all serde structs

Open oleiman opened this issue 1 year ago • 25 comments

In effect, this PR modifies serde to require using serde_fields for all structs EXCEPT those implementing both serde_read and serde_write. In these cases envelope_to_tuple won't generally participate in overload resolution

Also adds serde_fields() to several existing structs, including those generated automatically for fuzz-testing.

Why do this? This PR provides a motivating example. Where possible, we try to avoid compat-breaking serde changes by appending new fields to the end of the binary format. The implicit coupling between aggregate field ordering and binary format makes it easy to introduce subtle serialization bugs or unnecessary compat breaks.

Backports Required

  • [x] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [ ] v24.2.x
  • [ ] v24.1.x
  • [ ] v23.3.x

Release Notes

  • none

oleiman avatar Aug 07 '24 18:08 oleiman

/ci-repeat 1 release

oleiman avatar Aug 07 '24 18:08 oleiman

/ci-repeat 1

oleiman avatar Aug 07 '24 23:08 oleiman

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6009-4b03-aec1-ed4996c0a79a:

"rptest.tests.recovery_mode_test.RecoveryModeTest.test_rolling_restart"
"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_and_segment_metadata"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_existence"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6004-42f7-9508-1976d10b810b:

"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=no_check"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=-1.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6006-4028-9934-40514b596708:

"rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting"
"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=-1.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6007-482b-a8bc-53eb021338c9:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_and_segment_metadata"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=1.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f60-be5f-4e6c-a136-374e8e806250:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.FAST_MOVES.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.scaling_up_test.ScalingUpTest.test_moves_with_local_retention.use_topic_property=True"
"rptest.tests.simple_e2e_test.SimpleEndToEndTest.test_relaxed_acks.write_caching=True.disable_batch_cache=False"
"rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting_with_partition_moves"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_compacting_during_leadership_transfer.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=1.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f60-be5b-41f7-8d04-7012a56d0c28:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.FAST_MOVES.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact"
"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_nodes_with_reclaimable_space"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_write.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.multi_restarts_with_archival_test.MultiRestartTest.test_recovery_after_multiple_restarts.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=0.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=-1.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_existence"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_and_segment_metadata"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f60-be61-40f3-ba20-106bfde29737:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.FAST_MOVES.cleanup_policy=compact.delete"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_compacting_during_leadership_transfer.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_config_batches.num_messages=2.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_and_segment_metadata"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=1.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/52635#01912f60-be5d-4e61-ab1b-2ff1bf9730d0:

"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.FAST_MOVES.cleanup_policy=compact.delete"
"rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=off.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact.delete"
"rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting"
"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestCompactedTopic.test_write.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.scaling_up_test.ScalingUpTest.test_moves_with_local_retention.use_topic_property=False"
"rptest.tests.multi_restarts_with_archival_test.MultiRestartTest.test_recovery_after_multiple_restarts.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=0.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=0.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=no_check"
"rptest.tests.adjacent_segment_merging_test.AdjacentSegmentMergingTest.test_reupload_of_local_segments.acks=-1.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/52977#0191550a-eb87-4aad-8a64-26539e006c3b:

"rptest.tests.leadership_transfer_test.AutomaticLeadershipBalancingTest.test_automatic_rebalance"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#0191773e-a962-47fa-9b1e-dc702a9434ca:

"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_default"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_mixed_node_controller_lower_version"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#0191773e-a964-490d-a17c-8ae15bc6540d:

"rptest.tests.quota_management_test.QuotaManagementTest.test_kafka_configs"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_node_crash"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#0191773e-a966-4924-9ae6-4786ca3455f2:

"rptest.tests.rpk_cluster_quota_test.RpkClusterQuotaTest.test_import_describe_quotas"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#0191773e-a968-4cf0-aa45-90e910853caf:

"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_any"
"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_name"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#01917740-42b8-4b8a-a8b2-f49bc0c88379:

"rptest.tests.rpk_cluster_quota_test.RpkClusterQuotaTest.test_import_describe_quotas"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#01917740-42b9-4778-9f6c-14e23bce22e6:

"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_any"
"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_name"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_mixed_node_controller_lower_version"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#01917740-42b4-47ec-b55d-ccc8baf8d67d:

"rptest.tests.quota_management_test.QuotaManagementTest.test_describe_default"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_node_crash"

new failures in https://buildkite.com/redpanda/redpanda/builds/53329#01917740-42b6-4ab8-b9c1-f9a75988966e:

"rptest.tests.quota_management_test.QuotaManagementTest.test_kafka_configs"

vbotbuildovich avatar Aug 08 '24 01:08 vbotbuildovich

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6004-42f7-9508-1976d10b810b

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52635#01912f5f-6007-482b-a8bc-53eb021338c9

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52635#01912f60-be5b-41f7-8d04-7012a56d0c28

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52780#01914326-9a3d-4dd4-91df-5f62e843d081

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53329#0191773e-a966-4924-9ae6-4786ca3455f2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53329#01917740-42b6-4ab8-b9c1-f9a75988966e

vbotbuildovich avatar Aug 08 '24 01:08 vbotbuildovich

/ci-repeat 1

oleiman avatar Aug 11 '24 18:08 oleiman

/ci-repeat 1

oleiman avatar Aug 15 '24 03:08 oleiman

/ci-repeat 1

oleiman avatar Aug 15 '24 06:08 oleiman

CI Failure:

  • https://redpandadata.atlassian.net/browse/CORE-4243 (known, unrelated)

oleiman avatar Aug 15 '24 18:08 oleiman

I wonder if we should go further and require serde_fields except for custom implementations of serde_read and serde_write. so even for single field structs?

Ha, yeah, I've been avoiding flipping this to ready for exactly that reason. Leaning towards yes; better to require it up front vs. spewing template mess on the next append.

oleiman avatar Aug 21 '24 01:08 oleiman

This is awesome, +1.

WillemKauf avatar Aug 21 '24 14:08 WillemKauf

/ci-repeat 1

oleiman avatar Aug 21 '24 19:08 oleiman

/ci-repeat 1 release

oleiman avatar Aug 21 '24 20:08 oleiman

force push to rebase dev and fix a merge conflict

oleiman avatar Aug 21 '24 21:08 oleiman

/ci-repeat 1

oleiman avatar Aug 21 '24 22:08 oleiman

CI Failures:

  • debug build
    • quota_management_test.QuotaManagementTest.test_describe_default (~~passes locally, needs issue?~~)
    • self_test_test.SelfTestTest.test_self_test_mixed_node_controller_lower_version (~~transient error?~~)
  • release build
    • quota_management_test.QuotaManagementTest.test_describe_default (~~passes locally, needs issue?~~)
    • self_test_test.SelfTestTest.test_self_test_node_crash (~~passes locally, needs issue?~~)

~~None of these seems serde related, but I'm not finding open issues for any of these...~~

oleiman avatar Aug 22 '24 01:08 oleiman

self_test_test.SelfTestTest.test_self_test_mixed_node_controller_lower_version

From the logs, is a serde error.

WillemKauf avatar Aug 22 '24 02:08 WillemKauf

self_test_test.SelfTestTest.test_self_test_mixed_node_controller_lower_version

From the logs, is a serde error.

oh? i haven't looked at the broker logs yet... passes consistently on my dev box so I assumed transient. will take a look

oleiman avatar Aug 22 '24 02:08 oleiman

ah, I have the wrong branch checked out lol. disregard https://github.com/redpanda-data/redpanda/pull/22782#issuecomment-2303531940. Thanks @WillemKauf!

oleiman avatar Aug 22 '24 02:08 oleiman

force push to fix an integration issue and a couple omissions.

oleiman avatar Aug 22 '24 03:08 oleiman

@felixguendling sup?

dotnwat avatar Aug 22 '24 03:08 dotnwat

The general idea LGTM. Make sure to check that all necessary fields are (de)serialized when doing a code review (before this check was not necessary with "automatic mode" aka aggregate to tuple).

felixguendling avatar Aug 22 '24 13:08 felixguendling

CI Failure:

  • storage_e2e_rpfixture passes locally. flake?

oleiman avatar Aug 22 '24 16:08 oleiman

storage_e2e_rpfixture passes locally. flake?

Yeah, previously flakey, there was a revert PR that should fix this that got merged this AM. Re-running.

WillemKauf avatar Aug 22 '24 16:08 WillemKauf

@oleiman do you think we should add a static assert if it doesn't already exist that serde_fields xor serde_read/write are used so we don't have to think about ambiguity if both are present?

dotnwat avatar Aug 26 '24 15:08 dotnwat

@oleiman do you think we should add a static assert if it doesn't already exist that serde_fields xor serde_read/write are used so we don't have to think about ambiguity if both are present?

Seems like a good idea. I think in some cases we use serde_fields on the write path with explicit reader for compat reasons, but that stuff can be rejiggered as well.

Anyway, I'm going to throw this PR into draft mode and split it up. Thanks @dotnwat & @WillemKauf for feedback 🙂

oleiman avatar Aug 26 '24 21:08 oleiman

superseded by https://github.com/redpanda-data/redpanda/pull/23126 et al.

oleiman avatar Sep 06 '24 18:09 oleiman