dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Uncaught exception on rename dis

Open fernandomacho opened this issue 1 year ago • 1 comments

Dragonfly crash when try to rename tmp dump.

F20240522 02:32:26.647239 3603399 init.cc:83] Uncaught exception: filesystem error: cannot rename: No such file or directory [/var/lib/dragonfly/dump-2024-05-22T02:32:22-0000.dfs.tmp] [/var/lib/dragonfly/dump-2024-05-22T02:32:22-0000.dfs]

Yesterday my dragonfly crash when. try to rename dump file.

Evidently the /var/lib/draginfly directory exists and there is enough disk space. After the crash dragonfly does not load the last saved snapshot.

My version is 1.18.1

Thanks

fernandomacho avatar May 22 '24 16:05 fernandomacho

@BorysTheDev can you please take a look?

romange avatar May 22 '24 17:05 romange

@fernandomacho Could you clarify what scenario it was: a command like SAVE, BGSAVE, or persistence functionality by time?

BorysTheDev avatar May 27 '24 12:05 BorysTheDev

@BorysTheDev I suspect they used CONFIG SET DBFILENAME and tried to change the name dynamically. The rest of the chore should help to reproduce it

kostasrim avatar May 27 '24 13:05 kostasrim

Hello, we save we manually run every hour at minute 0 with BGSAVE command (just without the DBFILENAME SET) and in the config flag file it is configured to save every hour

--snapshot_cron=0 */1 * * * --dbfilename=dump-{timestamp}`

The cron is redundant, but if you look at the time of the error it doesn't look like this. The error occurs at 02:32:22.

Never use CONFIG SET DBFILENAME

fernandomacho avatar May 27 '24 16:05 fernandomacho

@fernandomacho do I understand correctly that you have 2 backup mechanisms with BGSAVE and the snapshot_cron save, and the crash is happened during snapshot_cron save? Also, could you clarify if there were other *.dfs.tmp files in the folder or any files with the modification timestamp near 2024-05-22T02:32:22

BorysTheDev avatar May 27 '24 17:05 BorysTheDev

Hello, correct. It doesn't seem to be the snapshot because as you can see it should run every hour at 0 minutes. But the system cron doesn't run at the time of the crash either. I can't tell you what was in that directory when the error happened, but there certainly wasn't any backup, because when dragonfly restarted, in the logs it said that there was no backup and it booted empty.

fernandomacho avatar May 27 '24 17:05 fernandomacho

If you want, because I think looking for this error is going to be like "looking for a needle in a haystack", we can close the ticket and if it happens again I try to collect more info. If I remember correctly, there was nothing in the logs.... but the truth is that I didn't look closely at the system's logs either. Do you think so?

fernandomacho avatar May 27 '24 17:05 fernandomacho

Please don't close it. Anycase, I want to prevent crashing if the snapshot fails and add error logs instead

BorysTheDev avatar May 27 '24 17:05 BorysTheDev

But the system cron doesn't run at the time of the crash either.

We should verify this

kostasrim avatar May 27 '24 18:05 kostasrim

I've made a small fix to prevent Dragonfly from crashing if we have an error with snapshot moving #3092

BorysTheDev avatar May 28 '24 08:05 BorysTheDev

Thanks!!

fernandomacho avatar May 28 '24 18:05 fernandomacho

@kostasrim I haven't found the reason for the issue, so I think we can close it until we get some additional info.

BorysTheDev avatar May 29 '24 09:05 BorysTheDev