seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Feature][ConnectorV2] new option for determining split owners

Open loustler opened this issue 8 months ago • 1 comments

Search before asking

  • [X] I had searched in the feature and found no similar feature requirement.

Description

Connectors determine a owner of split like (tp.hashCode() & Integer.MAX_VALUE) % numReaders). This make a skew of own split among readers in my case. Why not add a optional configuration that allows user to choose the class that determines the split owner they want?

Usage Scenario

env {
  parallelism = 1
  job.mode = "BATCH"
  #spark config
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 1
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
}

source {
  MongoDB {
    uri = "mongodb://e2e_mongodb:27017/test_db"
    database = "test_db"
    collection = "test_null_op_db"
    match.projection = "{ c_bigint:0 }"
    result_table_name = "mongodb_null_table"
    cursor.no-timeout = true
    fetch.size = 1000
    max.time-min = 100
    determine.split.owner.class = "org.apache.seatunnel.connectors.seatunnel.common.source.HashCodeBasedSplitOwnerDeterminer"
    schema = {
      fields {
        c_map = "map<string, string>"
        c_array = "array<int>"
        c_string = string
        c_boolean = boolean
        c_int = int
        c_bigint = bigint
        c_double = double
        c_row = {
          c_map = "map<string, string>"
          c_array = "array<int>"
          c_string = string
          c_boolean = boolean
          c_int = int
          c_bigint = bigint
          c_double = double
        }
      }
    }
  }
}

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

loustler avatar Jun 21 '24 08:06 loustler