amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[Bug]: When there duplicate records exist in table, the spark upsert can't merge record

Open baiyangtx opened this issue 2 years ago • 3 comments

What happened?

If the table already has some records with duplicate primary key, then using spark insert into sql to do upsert will not work as expected. rows still are duplicated but expected to be merged into one.

Affects Versions

0.4.1

What engines are you seeing the problem on?

Spark

How to reproduce

image

Relevant log output

No response

Anything else

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

baiyangtx avatar Mar 30 '23 09:03 baiyangtx

I can't reproduce it in 0.4.1.

CREATE TABLE IF NOT EXISTS user (
    id INT,
    name string,
    ts TIMESTAMP,
    PRIMARY KEY(id)
) USING arctic 
PARTITIONED BY (days(ts));

insert overwrite db.user values (2, "frank", timestamp("2022-07-02 09:11:00"));
...
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));

alter table db.user set tblproperties (
    'write.upsert.enabled' = 'true');

insert into db.user values (2, "llllll", timestamp("2022-07-02 09:11:00"));

result

image image

wangtaohz avatar Mar 30 '23 11:03 wangtaohz

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 20 '24 00:08 github-actions[bot]

Any progress on this issue? @baiyangtx

zhoujinsong avatar Aug 20 '24 02:08 zhoujinsong

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 10 '25 00:08 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Aug 25 '25 00:08 github-actions[bot]