rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

rocksdb 6.25 - rocksdb looking for purged SST files after deleting them.

Open vkanduveed opened this issue 2 years ago • 5 comments

Rocksdb appears to cleanup SST files as part of purging. However, soon afterwards, it is complaining about not finding the deleted SST files.

2022/07/12-02:39:54.296246 7f4065192700(1666) SST files in /pd_fs/0x00000001 dir, Total Num: 6390, files: 000047.sst 1049792.sst 472475.sst 521826.sst 5270671.sst 5298902.sst 5522646.sst 5522647.sst 5522648.sst

2022/07/12-02:51:40.633886 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/5298902.sst type=9 #5298902 -- OK 2022/07/12-02:51:40.633926 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/5270671.sst type=9 #5270671 -- OK 2022/07/12-02:51:40.634020 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/521826.sst type=9 #521826 -- OK 2022/07/12-02:51:40.634052 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/472475.sst type=9 #472475 -- OK 2022/07/12-02:51:40.634080 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/1049792.sst type=9 #1049792 -- OK 2022/07/12-02:51:40.634111 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/000047.sst type=9 #47 -- OK

Then the process exits (uncleanly) and in the next LOG file, we see : 2022/07/12-02:56:07.431264 7f93845cf700(8847) SST files in /pd_fs/0x00000001 dir, Total Num: 2, files: 6106242.sst 6106243.sst 2022/07/12-02:56:07.762256 7f93845cf700(8847) [ERROR] [/version_set.cc:2447] Unable to load table properties for file 47 --- IO error: No such file or directory: While open a file for random read: /pd_fs/0x00000001/000047.sst: No such file or directory

  • Why does rocksdb delete 6388 out of 6390 sst files ? Could the MANIFEST file not be written correctly ?? Also, why does it look for it after deleting it ?? obviously, we are not able to open and use the db.

vkanduveed avatar Jul 13 '22 20:07 vkanduveed

In the second newest LOG, you can find out which is the last MANIFEST before the unclean shutdown. In the LOG created after restart, you can find out which MANIFEST is being used for recovery. If you have both, then you can use the following command to check

ldb --db=</path/to/db> --output_hex manifest_dump --path=</path/to/manifest> --verbose

This will show whether the MANIFEST inspected thinks the file should exist or not.

riversand963 avatar Jul 13 '22 21:07 riversand963

@riversand963 - Thanks for the response; MANIFEST dump thinks that the file should exist.
But it has been deleted.

For eg.

VersionEdit { LogNumber: 6106250 AddFile: 0 6093975 8608 'file_checksum: file_checksum_func_name: Unknown AddFile: 6 6100933 1960 ': file_checksum_func_name: Unknown ColumnFamily: 1 } 2022/07/12-02:51:40.493805 7f4065993700(1667) [DEBUG] [/db_impl/db_impl_files.cc:344] [JOB 5] Delete /pd_fs/0x00000001/6093975.sst type=9 #6093975 -- OK


2022/07/12-02:56:07.714479 7f93845cf700(8847) [ERROR] [/version_set.cc:2447] Unable to load table properties for file6093975 --- IO error: No such file or directory: While open a file for random read: /pd_fs/0x00000001/6093975.sst: No such file or directory

vkanduveed avatar Jul 13 '22 21:07 vkanduveed

what about the MANIFEST before the crash?

riversand963 avatar Jul 13 '22 21:07 riversand963

@riversand963 - we don;t save the older ones is there a way to keep the old manifest files ? I see someone else faced a similar issue - https://github.com/facebook/rocksdb/issues/9419

vkanduveed avatar Jul 13 '22 22:07 vkanduveed

This also looks similar to #10258

dparrella avatar Sep 07 '22 18:09 dparrella