seatunnel
seatunnel copied to clipboard
[Improve][Kafka] kafka source refactored some reader read logic
An attempt to improve kafka source with reference to flink kafka source
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
- [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
- [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
- [ ] If you are contributing the connector code, please check that the following files are updated:
- Update change log that in connector document. For more details you can refer to connector-v2
- Update plugin-mapping.properties and add new connector information in it
- Update the pom file of seatunnel-dist
- [ ] Update the
release-note
.
@zhilinli123 @hailin0 @Hisoka-X @EricJoy2048 @TyrantLucifer cc
Please fix the conflict
Please fix the conflict
done
@hailin0
Please share more information about why we need refactor kafka? And what we can get after refactor?
Please share more information about why we need refactor kafka? And what we can get after refactor?
+1
Please share more information about why we need refactor kafka? And what we can get after refactor?
Please share more information about why we need refactor kafka? And what we can get after refactor?
Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.
Firstly, I believe there is room for improvement in the previous approach where a new consumer thread is created for each split, and each consumer thread polls in a loop. Enhancements could involve having one consumer thread per parallelism level, ensuring clearer offset management, and promptly removing consumed partitions to reduce overhead. Secondly, a more seamless integration of streaming and batch processing can be achieved, eliminating the need for consumer threads to differentiate between streaming and batch behaviors.
Make sense to me. Thanks @Carl-Zhou-CN !
Checking...