bitsail
bitsail copied to clipboard
[BitSail][Connector] Support Assert sink connector
Is your feature request related to a problem? Please describe
Currently we support print sink and we use print in some tests, but print is not sufficient enough to verify the test result.
So we need an AssertSink
to verify the test result more accurately.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Hi @garyli1019, I'd like to try this, please assign it to me, thanks.
@liuxiaocs7 Assigned to you!
@liuxiaocs7 hello, just checking how's everything going with this issue. We have a contributor group to discuss dev issue. Please feel free to join if you are interested
@garyli1019 Hello, I have successfully run bitsail locally and on the cluster, but I have been busy recently and haven't started the design yet. I will start as soon as possible and discuss with you here. The contributor group will help, i'll join it, thx :)
Hi @garyli1019, I refer to the design of the assert connector in seatunnel, including row_rules and column_rules.
In row rules, we need to verify the number of data pieces of a job (min_row
and max_row
), in columns rules, we need to verify the data type and value range, including (min
, max
for number min_len
, max_len
for string and not_null
for all fields), and the conf just like this, what do you think of this?
Also, I have some questions, in my opinion, sink connector V1 use DelegateFlinkWriter
to handler the sink logic, i noticed that we get column
conf and construct convert, so column
in writer is necessary, but ColumnInfo
only includes string properties
which support unique, nullable, not_null(see TypeProperty), so i add column_rules
in json conf, but it looks redundant, do we have a better design idea?
https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-cores/bitsail-core-flink-bridge/src/main/java/com/bytedance/bitsail/core/flink/bridge/writer/delegate/DelegateFlinkWriter.java#L113-L121
https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-common/src/main/java/com/bytedance/bitsail/common/model/ColumnInfo.java#L39
https://github.com/bytedance/bitsail/blob/1a5e3f6e8b4518cd2a28e5a028395db2800c82cb/bitsail-common/src/main/java/com/bytedance/bitsail/common/typeinfo/TypeProperty.java#L28-L34
The sample conf now:
{
"job": {
"common": {
"cid": 0,
"domain": "test",
"job_id": -24,
"job_name": "bitsail_connector_assert_test",
"instance_id": -720,
"user_name": "root"
},
"reader": {
"class": "com.bytedance.bitsail.connector.fake.source.FakeSource",
"total_count": 17,
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
]
},
"writer": {
"class": "com.bytedance.bitsail.connector.assertion.sink.AssertSink",
"content_type": "json",
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
}
],
"row_rules": {
"min_row": 5,
"max_row": 10
},
"column_rules": {
"name": {
"not_null": true,
"min_len": 5,
"max_len": 20
},
"age": {
"not_null": true,
"min": 2,
"max": 18
}
}
}
}
}
@liuxiaocs7 Hi, thanks for elaborate your design! Your configuration design looks good to me. I couldn't come up with a better idea than this. Please go ahead with this design