seatunnel [Improve][Kafka] kafka source refactored some reader read logic

An attempt to improve kafka source with reference to flink kafka source

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

[ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
[ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
[ ] If you are contributing the connector code, please check that the following files are updated:
1. Update change log that in connector document. For more details you can refer to connector-v2
2. Update plugin-mapping.properties and add new connector information in it
3. Update the pom file of seatunnel-dist
[ ] Update the release-note.

Feb 28 '24 09:02 Carl-Zhou-CN

@zhilinli123 @hailin0 @Hisoka-X @EricJoy2048 @TyrantLucifer cc

May 29 '24 10:05 Carl-Zhou-CN

Please fix the conflict

Jun 15 '24 14:06 hailin0

Please fix the conflict

done

Aug 08 '24 10:08 Carl-Zhou-CN

@hailin0

Aug 08 '24 10:08 Carl-Zhou-CN

Please share more information about why we need refactor kafka? And what we can get after refactor?

Aug 24 '24 03:08 Hisoka-X

Please share more information about why we need refactor kafka? And what we can get after refactor?

+1

Aug 24 '24 03:08 hailin0

Please share more information about why we need refactor kafka? And what we can get after refactor?

Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.

Aug 24 '24 04:08 Carl-Zhou-CN

Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.

Make sense to me. Thanks @Carl-Zhou-CN !

Aug 24 '24 09:08 Hisoka-X

Checking...

Aug 27 '24 03:08 Hisoka-X

seatunnel seatunnel copied to clipboard

[Improve][Kafka] kafka source refactored some reader read logic

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

seatunnel
seatunnel copied to clipboard