hudi
hudi copied to clipboard
there are duplicated records if copied one partition data file from another s3 bucket
If I copied the whole table file from another s3 bucker folder and then continue steaming upsert process, there is no duplicated records. It is working fine. But If I removed all the files from one partition and only copied one partition file from same table in another s3 bucket and then continue steaming upsert process, After that there are some duplicated records. How to fix this issue? Is there any way to update the metadata? I am using hudi 0.9.0.
@njalan What do you mean by "copied one partition file from same table". Are you referring copying the parquet files?
@ad1happy2go I copied all the files from that partition folder not only parquet files.
but partition directory only contains the parquet files AND log files (in case of MOR). Right?
If you just copy partition files, how you are updating the .hoodie timeline?
@ad1happy2go Yes only have the parquet files. is there any way to manually update the meta data?
@njalan No there is no way and we dont recommend also. Best way is to instead of moving use spark to write code and create another Hudi Table with partitions you need.
@ad1happy2go Got it thanks