doris icon indicating copy to clipboard operation
doris copied to clipboard

[Bug] For Stream load, partial column import is used, and the default value for the 'current_timestamp' column is the table creation date.

Open YS0mind opened this issue 1 year ago • 3 comments

          I also find this problem when I load data into Doris.
  1. Create table in doris to test like this:
create table data_province
(
    `run_date`         date           not null comment '日期',
    data_type_id       int            not null comment '数据类型',
    data               decimal(24, 8) not null comment '数据值',
    create_time        datetime       not null default current_timestamp comment '创建时间'
)
    engine = olap unique key(`run_date`,`data_type_id`)
    comment "分日数据表"
    partition by range(`run_date`) ( )
    distributed by hash(`data_type_id`) buckets 10
    properties (
        "storage_format" = "V2",
        "enable_unique_key_merge_on_write" = "true",
        "dynamic_partition.enable" = "true",
        "dynamic_partition.time_unit" = "month",
        "dynamic_partition.create_history_partition" = "true",
        "dynamic_partition.history_partition_num" = "10",
        "dynamic_partition.start" = "-6",
        "dynamic_partition.end" = "3",
        "dynamic_partition.prefix" = "p",
        "dynamic_partition.replication_num" = "1",
        "dynamic_partition.buckets" = "10"
    );
  1. Insert data by stream load like this,and you can find that field "create_time" is the time you create table,however,when I try insert data by insert-into method,everything is ok,the field "create_time" is the time I insert record.
# stream load导入 默认的时间固定为建表时间,insert into方式则会是正常的插入记录的时间
# vim /tmp/test.csv
# 2024-01-01,1,67200.00000000
curl --location-trusted -u root \
-H "partial_columns:true" \
-H "column_separator:," \
-H "columns:run_date,data_type_id,data" \
-H "two_phase_commit:false" \
-H "label:stream_load_test01" \
-T /tmp/test.csv http://127.0.0.1:8030/api/iotest/data_province/_stream_load
  1. If I import by specifying -H "columns: current_timestamp()",the field "create_time" is the time I insert record,but when I insert new record with the same key field,this filed will change.I just want to save the time I create this record.
# 部分列导入可以生成正确的默认时间,但每一次相同key的记录导入会把
# create_time也覆盖成最新的时间
curl --location-trusted -u root \
-H "partial_columns:true" \
-H "column_separator:," \
-H "columns:run_date,data_type_id,data,create_time=current_timestamp()" \
-H "two_phase_commit:false" \
-H "label:stream_load_test02" \
-T /tmp/test.csv http://127.0.0.1:8030/api/iotest/dwd_rd_data_province/_stream_load

Originally posted by @YS0mind in https://github.com/apache/doris-flink-connector/issues/191#issuecomment-1905753329

YS0mind avatar Jan 25 '24 01:01 YS0mind