Uncaught exception on rename dis
Dragonfly crash when try to rename tmp dump.
F20240522 02:32:26.647239 3603399 init.cc:83] Uncaught exception: filesystem error: cannot rename: No such file or directory [/var/lib/dragonfly/dump-2024-05-22T02:32:22-0000.dfs.tmp] [/var/lib/dragonfly/dump-2024-05-22T02:32:22-0000.dfs]
Yesterday my dragonfly crash when. try to rename dump file.
Evidently the /var/lib/draginfly directory exists and there is enough disk space. After the crash dragonfly does not load the last saved snapshot.
My version is 1.18.1
Thanks
@BorysTheDev can you please take a look?
@fernandomacho Could you clarify what scenario it was: a command like SAVE, BGSAVE, or persistence functionality by time?
@BorysTheDev I suspect they used CONFIG SET DBFILENAME and tried to change the name dynamically. The rest of the chore should help to reproduce it
Hello, we save we manually run every hour at minute 0 with BGSAVE command (just without the DBFILENAME SET) and in the config flag file it is configured to save every hour
--snapshot_cron=0 */1 * * *
--dbfilename=dump-{timestamp}`
The cron is redundant, but if you look at the time of the error it doesn't look like this. The error occurs at 02:32:22.
Never use CONFIG SET DBFILENAME
@fernandomacho do I understand correctly that you have 2 backup mechanisms with BGSAVE and the snapshot_cron save, and the crash is happened during snapshot_cron save? Also, could you clarify if there were other *.dfs.tmp files in the folder or any files with the modification timestamp near 2024-05-22T02:32:22
Hello, correct. It doesn't seem to be the snapshot because as you can see it should run every hour at 0 minutes. But the system cron doesn't run at the time of the crash either. I can't tell you what was in that directory when the error happened, but there certainly wasn't any backup, because when dragonfly restarted, in the logs it said that there was no backup and it booted empty.
If you want, because I think looking for this error is going to be like "looking for a needle in a haystack", we can close the ticket and if it happens again I try to collect more info. If I remember correctly, there was nothing in the logs.... but the truth is that I didn't look closely at the system's logs either. Do you think so?
Please don't close it. Anycase, I want to prevent crashing if the snapshot fails and add error logs instead
But the system cron doesn't run at the time of the crash either.
We should verify this
I've made a small fix to prevent Dragonfly from crashing if we have an error with snapshot moving #3092
Thanks!!
@kostasrim I haven't found the reason for the issue, so I think we can close it until we get some additional info.