seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Fix][connector-hive] source split class should implements equals method to avoid repeat read same split

Open sohurdc opened this issue 6 months ago • 1 comments

Purpose of this pull request

Solve the problem that Hive may read the same partition multiple times. Because splits are stored in sets for deduplication, but HiveSourceSplit does not implement the equals method, the same path may be considered as different splits and read multiple times.

Does this PR introduce any user-facing change?

no

How was this patch tested?

no

Check list

  • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
  • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
  • [ ] If you are contributing the connector code, please check that the following files are updated:
    1. Update plugin-mapping.properties and add new connector information in it
    2. Update the pom file of seatunnel-dist
    3. Add ci label in label-scope-conf
    4. Add e2e testcase in seatunnel-e2e
    5. Update connector plugin_config

sohurdc avatar Jun 17 '25 15:06 sohurdc

Could you add a test case for this?

Hisoka-X avatar Jun 18 '25 03:06 Hisoka-X

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 7 days if no further activity occurs.

github-actions[bot] avatar Oct 17 '25 00:10 github-actions[bot]

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

github-actions[bot] avatar Oct 25 '25 00:10 github-actions[bot]