rocksdb
rocksdb copied to clipboard
rocksdb 6.25 - rocksdb looking for purged SST files after deleting them.
Rocksdb appears to cleanup SST files as part of purging. However, soon afterwards, it is complaining about not finding the deleted SST files.
2022/07/12-02:39:54.296246 7f4065192700(1666) SST files in /pd_fs/0x00000001 dir, Total Num: 6390, files: 000047.sst 1049792.sst 472475.sst 521826.sst 5270671.sst 5298902.sst 5522646.sst 5522647.sst 5522648.sst
2022/07/12-02:51:40.633886 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/5298902.sst type=9 #5298902 -- OK 2022/07/12-02:51:40.633926 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/5270671.sst type=9 #5270671 -- OK 2022/07/12-02:51:40.634020 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/521826.sst type=9 #521826 -- OK 2022/07/12-02:51:40.634052 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/472475.sst type=9 #472475 -- OK 2022/07/12-02:51:40.634080 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/1049792.sst type=9 #1049792 -- OK 2022/07/12-02:51:40.634111 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/000047.sst type=9 #47 -- OK
Then the process exits (uncleanly) and in the next LOG file, we see : 2022/07/12-02:56:07.431264 7f93845cf700(8847) SST files in /pd_fs/0x00000001 dir, Total Num: 2, files: 6106242.sst 6106243.sst 2022/07/12-02:56:07.762256 7f93845cf700(8847) [ERROR] [/version_set.cc:2447] Unable to load table properties for file 47 --- IO error: No such file or directory: While open a file for random read: /pd_fs/0x00000001/000047.sst: No such file or directory
- Why does rocksdb delete 6388 out of 6390 sst files ? Could the MANIFEST file not be written correctly ?? Also, why does it look for it after deleting it ?? obviously, we are not able to open and use the db.
In the second newest LOG, you can find out which is the last MANIFEST before the unclean shutdown. In the LOG created after restart, you can find out which MANIFEST is being used for recovery. If you have both, then you can use the following command to check
ldb --db=</path/to/db> --output_hex manifest_dump --path=</path/to/manifest> --verbose
This will show whether the MANIFEST inspected thinks the file should exist or not.
@riversand963 - Thanks for the response; MANIFEST dump thinks that the file should exist.
But it has been deleted.
For eg.
VersionEdit { LogNumber: 6106250 AddFile: 0 6093975 8608 'file_checksum: file_checksum_func_name: Unknown AddFile: 6 6100933 1960 ': file_checksum_func_name: Unknown ColumnFamily: 1 } 2022/07/12-02:51:40.493805 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/6093975.sst type=9 #6093975 -- OK
2022/07/12-02:56:07.714479 7f93845cf700(8847) [ERROR] [/version_set.cc:2447] Unable to load table properties for file6093975 --- IO error: No such file or directory: While open a file for random read: /pd_fs/0x00000001/6093975.sst: No such file or directory
what about the MANIFEST before the crash?
@riversand963 - we don;t save the older ones is there a way to keep the old manifest files ? I see someone else faced a similar issue - https://github.com/facebook/rocksdb/issues/9419
This also looks similar to #10258