seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Improve][Kafka] kafka source refactored some reader read logic

Open Carl-Zhou-CN opened this issue 11 months ago • 9 comments

An attempt to improve kafka source with reference to flink kafka source

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

  • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
  • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
  • [ ] If you are contributing the connector code, please check that the following files are updated:
    1. Update change log that in connector document. For more details you can refer to connector-v2
    2. Update plugin-mapping.properties and add new connector information in it
    3. Update the pom file of seatunnel-dist
  • [ ] Update the release-note.

Carl-Zhou-CN avatar Feb 28 '24 09:02 Carl-Zhou-CN

@zhilinli123 @hailin0 @Hisoka-X @EricJoy2048 @TyrantLucifer cc

Carl-Zhou-CN avatar May 29 '24 10:05 Carl-Zhou-CN

Please fix the conflict

hailin0 avatar Jun 15 '24 14:06 hailin0

Please fix the conflict

done

Carl-Zhou-CN avatar Aug 08 '24 10:08 Carl-Zhou-CN

@hailin0

Carl-Zhou-CN avatar Aug 08 '24 10:08 Carl-Zhou-CN

Please share more information about why we need refactor kafka? And what we can get after refactor?

Hisoka-X avatar Aug 24 '24 03:08 Hisoka-X

Please share more information about why we need refactor kafka? And what we can get after refactor?

+1

hailin0 avatar Aug 24 '24 03:08 hailin0

Please share more information about why we need refactor kafka? And what we can get after refactor?

Please share more information about why we need refactor kafka? And what we can get after refactor?

Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.

Carl-Zhou-CN avatar Aug 24 '24 04:08 Carl-Zhou-CN

Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.

Make sense to me. Thanks @Carl-Zhou-CN !

Hisoka-X avatar Aug 24 '24 09:08 Hisoka-X

Checking...

Hisoka-X avatar Aug 27 '24 03:08 Hisoka-X