[Bug]: Mixed Hive Table can't Sync Hive data properly
What happened?
As you can see ArcticTableFlag will be set to true when a Hive partition has previously written any data through Amoro.
But when I delete the data from this partition, write it again with Hive and try to synchronize it to the Mixed Hive table, the files cannot be added to the Mixed Hive table with this “if” logic, because there is no data in this partition of the Mixed Hive table so filesMap.get(partitionData) == null at the same time
ArcticTableFlag exists because the Hive partition has not been deleted and data has been written to it.
So I think there is a problem with this logic.
Affects Versions
master
What engines are you seeing the problem on?
Core
How to reproduce
- Create a Mixed Hive Table with partition
- Insert overwrite some data
- Delete the data insert overwrite before
- Insert into the same data with Hive
- Use HiveDataSync to sync step4's data to Mixed Hive Table
Relevant log output
No response
Anything else
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@nicochen Thanks for reporting this issue.
AFAIK, the reason why it is necessary to check whether a Hive partition has an Arctic Flag during the process of synchronizing Hive data when a new Hive partition is detected is:
- If the deletion of a partition on a Mixed-Hive table results in a successful submission to Iceberg but a failed submission to Hive, AMS will detect and delete the corresponding data in Hive.
- The
Iflogic here is to distinguish between two scenarios.
Based on this, when deleting data under a Hive partition, we may need to delete the ARCTIC FLAG in the Partition meta in HMS.
@nicochen Do you still have this problem after using the new version