[Bug]: When there duplicate records exist in table, the spark upsert can't merge record
What happened?
If the table already has some records with duplicate primary key, then using spark insert into sql to do upsert will not work as expected. rows still are duplicated but expected to be merged into one.
Affects Versions
0.4.1
What engines are you seeing the problem on?
Spark
How to reproduce

Relevant log output
No response
Anything else
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I can't reproduce it in 0.4.1.
CREATE TABLE IF NOT EXISTS user (
id INT,
name string,
ts TIMESTAMP,
PRIMARY KEY(id)
) USING arctic
PARTITIONED BY (days(ts));
insert overwrite db.user values (2, "frank", timestamp("2022-07-02 09:11:00"));
...
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));
alter table db.user set tblproperties (
'write.upsert.enabled' = 'true');
insert into db.user values (2, "llllll", timestamp("2022-07-02 09:11:00"));
result
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Any progress on this issue? @baiyangtx
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'