seatunnel
seatunnel copied to clipboard
[Feature][ConnectorV2] new option for determining split owners
Search before asking
- [X] I had searched in the feature and found no similar feature requirement.
Description
Connectors determine a owner of split like (tp.hashCode() & Integer.MAX_VALUE) % numReaders)
. This make a skew of own split among readers in my case. Why not add a optional configuration that allows user to choose the class that determines the split owner they want?
Usage Scenario
env {
parallelism = 1
job.mode = "BATCH"
#spark config
spark.app.name = "SeaTunnel"
spark.executor.instances = 1
spark.executor.cores = 1
spark.executor.memory = "1g"
spark.master = local
}
source {
MongoDB {
uri = "mongodb://e2e_mongodb:27017/test_db"
database = "test_db"
collection = "test_null_op_db"
match.projection = "{ c_bigint:0 }"
result_table_name = "mongodb_null_table"
cursor.no-timeout = true
fetch.size = 1000
max.time-min = 100
determine.split.owner.class = "org.apache.seatunnel.connectors.seatunnel.common.source.HashCodeBasedSplitOwnerDeterminer"
schema = {
fields {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_int = int
c_bigint = bigint
c_double = double
c_row = {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_int = int
c_bigint = bigint
c_double = double
}
}
}
}
}
Related issues
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct