[Improvement]: we should delete the metadata.json after snapshot is expired
Search before asking
- [x] I have searched in the issues and found no similar issues.
What would you like to be improved?
format: iceberg/mixed_hive module: AMS
A large number of metadata.json files still exist in the metatdata directory, even though a snapshot expiration has been performed (the latest metadta.json has the expected number of snapshots wrapped in it).
How should we improve?
delete the metadata.json
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Subtasks
No response
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
@nicochen Do you have time to fix it.
@Aireed If the task is not assigned, I want to improve it
@Aireed Currently, TableMaintainer#expireSnapshots will clean up related metadata.json and TableMaintainer#cleanOrphanFiles will cleans up non-referenced metadata.json. Will there still be some unnecessary metadata.json after these two cleanup jobs are processed?
@Aireed Currently, TableMaintainer#expireSnapshots will clean up related metadata.json and TableMaintainer#cleanOrphanFiles will cleans up non-referenced metadata.json. Will there still be some unnecessary metadata.json after these two cleanup jobs are processed?
sorry for late reply. don’t need this feature for now. I’m closing this issue.