flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Support MySQL in-source parallel deserialization

Open saligia-tju opened this issue 3 months ago • 1 comments

Introducing AsyncScheduler with PartitionedDeserializationScheduler

🚀 Core Architecture

Intelligent Event Routing: Leverages primary key-based partitioning to distribute data-change events across dedicated single-thread partition workers, ensuring optimal load distribution and thread safety.

Robust Queue Management: Implements bounded per-partition queues using ArrayBlockingQueue with a strict blocking policy, guaranteeing zero data loss through our no-drop mechanism.

Ordering Guarantees: The source thread's drainRound serves as the exclusive collector, maintaining per-key ordering integrity throughout the entire processing pipeline.

Reliable Checkpoint Recovery: Binds binlog offset advancement to post-emit operations, ensuring bulletproof checkpoint consistency and seamless recovery capabilities.

🌐 Global Async Control Path

Comprehensive Event Support: Seamlessly handles control events (DDL/Watermark/Heartbeat) through a sophisticated global async pathway.

Advanced Completion Management: Utilizes sequence-to-batch mapping for intelligent out-of-order completion handling while maintaining strict in-order emission guarantees.

⚙️ Flexible Configuration Framework

Builder-Driven Design: Offers comprehensive configuration control through intuitive builder patterns:

  • parallelDeserializeEnabled - Master switch (disabled by default)
  • parallelDeserializePkWorkers - Primary key worker scaling
  • parallelDeserializeThreads - Thread pool optimization
  • parallelDeserializeQueueCapacity - Queue size tuning

Backward Compatibility: When disabled, the system gracefully falls back to the original single-thread execution path with zero scheduler overhead.

🛡️ Enhanced Stability & Performance

Concurrency Safety: Resolves potential ConcurrentModificationException issues during emission by implementing snapshot-to-array conversion before iteration, eliminating iterator modification conflicts.

Code Excellence: Features comprehensive Chinese comments, streamlined code paths, and removal of legacy JVM system properties for improved maintainability.

📋 Migration Guide

Activation Steps:

  • Set parallelDeserializeEnabled=true
  • Configure optimal values for pkWorkers, threads, and queue parameters

Seamless Transition: When disabled, behavior remains 100% identical to previous versions, ensuring zero-risk deployment.

Integration Requirements: Ensure downstream systems respect per-key ordering constraints; all collection operations remain anchored to the source thread for consistency.

saligia-tju avatar Sep 17 '25 09:09 saligia-tju

Think for your contribution ,please don't use mater branch to merge.

ThorneANN avatar Sep 18 '25 07:09 ThorneANN