bitsail icon indicating copy to clipboard operation
bitsail copied to clipboard

[BitSail][Connector] Support Assert sink connector

Open garyli1019 opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe

Currently we support print sink and we use print in some tests, but print is not sufficient enough to verify the test result. So we need an AssertSink to verify the test result more accurately.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

garyli1019 avatar Nov 09 '22 09:11 garyli1019

Hi @garyli1019, I'd like to try this, please assign it to me, thanks.

liuxiaocs7 avatar Nov 16 '22 09:11 liuxiaocs7

@liuxiaocs7 Assigned to you!

garyli1019 avatar Nov 17 '22 09:11 garyli1019

@liuxiaocs7 hello, just checking how's everything going with this issue. We have a contributor group to discuss dev issue. Please feel free to join if you are interested img_v2_0cd2da0d-3975-4a0e-b257-6ee8a1223c3g

garyli1019 avatar Nov 30 '22 02:11 garyli1019

@garyli1019 Hello, I have successfully run bitsail locally and on the cluster, but I have been busy recently and haven't started the design yet. I will start as soon as possible and discuss with you here. The contributor group will help, i'll join it, thx :)

liuxiaocs7 avatar Dec 01 '22 01:12 liuxiaocs7

Hi @garyli1019, I refer to the design of the assert connector in seatunnel, including row_rules and column_rules. In row rules, we need to verify the number of data pieces of a job (min_row and max_row), in columns rules, we need to verify the data type and value range, including (min, max for number min_len, max_len for string and not_null for all fields), and the conf just like this, what do you think of this?

Also, I have some questions, in my opinion, sink connector V1 use DelegateFlinkWriter to handler the sink logic, i noticed that we get column conf and construct convert, so column in writer is necessary, but ColumnInfo only includes string properties which support unique, nullable, not_null(see TypeProperty), so i add column_rules in json conf, but it looks redundant, do we have a better design idea?

https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-cores/bitsail-core-flink-bridge/src/main/java/com/bytedance/bitsail/core/flink/bridge/writer/delegate/DelegateFlinkWriter.java#L113-L121

https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-common/src/main/java/com/bytedance/bitsail/common/model/ColumnInfo.java#L39

https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-common/src/main/java/com/bytedance/bitsail/common/typeinfo/TypeProperty.java#L28-L34

The sample conf now:

{
  "job": {
    "common": {
      "cid": 0,
      "domain": "test",
      "job_id": -24,
      "job_name": "bitsail_connector_assert_test",
      "instance_id": -720,
      "user_name": "root"
    },
    "reader": {
      "class": "com.bytedance.bitsail.connector.fake.source.FakeSource",
      "total_count": 17,
      "columns": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "age",
          "type": "int"
        }
      ]
    },
    "writer": {
      "class": "com.bytedance.bitsail.connector.assertion.sink.AssertSink",
      "content_type": "json",
      "columns": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "age",
          "type": "int"
        }
      ],
      "row_rules": {
        "min_row": 5,
        "max_row": 10
      },
      "column_rules": {
        "name": {
          "not_null": true,
          "min_len": 5,
          "max_len": 20
        },
        "age": {
          "not_null": true,
          "min": 2,
          "max": 18
        }
      }
    }
  }
}

liuxiaocs7 avatar Dec 08 '22 02:12 liuxiaocs7

@liuxiaocs7 Hi, thanks for elaborate your design! Your configuration design looks good to me. I couldn't come up with a better idea than this. Please go ahead with this design

garyli1019 avatar Dec 08 '22 12:12 garyli1019