BGREWRITEAOF is not failure safe
Describe the bug BGREWRITEAOF starts overwriting the existing file; hence in case of a failure, while creating the AOF file, the old flush is lost while the new one is partially updated.
To Reproduce
- add random sleep in BGREWRITEAOF after each command is flushed on the disk
- crash the process while BGREWRITE is happening
- see the AOF file content
Expected behavior Until the new AOF file is not ready, we should not alter the old file.
Yes, this is a major issue when I tested. A most stable policy would be to write the new flushes to a temporary file and then synchronize them with the main thread's AOF file. I'll work on this.
What if we write to a temporary file and then do a rename?
I think it should be okay if we do not maintain a sync through thread. AOG persistence is anyway just a backup not the primary storage.
Let's evaluate the complexities once.
Yes, creating a temporary and then renaming is also a good approach. To reduce complexities and to keep the concurrency model simple enough we can delegate the write to fsync() and fdatasync() system calls.
Regarding AOF persistence, fsync is for durability guarantees and is the least costly among other options to implement the same.
Yes, the primary function is in-memory operations. Redis is not durable by default and if someone uses it as a durable store with AOF and with frequent fsync we would loose most of Redis's advantages. A Redis with frequent fsync is a "Kafka in hiding" :smile:
Hello, can i take this?
Hello, can i take this?
Assigned
@arpitbbhayani , hey , i see that currently dice does not support actual aof logging right after each mutation operation but it solely for "snapshoting" the current state, i mean, there's is no actual rewrite of some runtime increasing aof file but rewrite of previous rewrites which are already optimized as the smallest aof possible for that moment. anyways, do you plan to support immediate logging in the future?
btw did you reproduce it on mac or linux? cuz on mac forking seems not to work properly
@amityahav I have not worked on this for a very long time. So I lack context :) Also, I think we do lack logging.
@amityahav, are there any findings from your side on this?
@AshwinKul28 Hey , started thinking about it but didn't have the chance to start working on it. I hope to start this weekend
@AshwinKul28 Hey, did you manage to run this command on a Mac M arch?
syscall.Syscall(syscall.SYS_FORK, 0, 0, 0) does not seem to work for me.
how ever, executing syscall.Syscall(syscall.SYS_CLONE, 0, 0, 0) on ARM64 linux works well
https://github.com/DiceDB/dice/pull/398 @arpitbbhayani this should be fixed with this
https://github.com/DiceDB/dice/issues/397
@ashwaniYDV Hey, I've been assigned to that and as you can see there's already a draft for it. What I've done extends your PR. I'm about to also ensure no 2 flushing processes can occur simultaneously. As well as cleanup when there's a failure
Hey @AshwinKul28 , @arpitbbhayani Its seems that it is not trivial to fork a GO process just like it is done in C. I tried to implement Fork and Wait both in darwin and linux archs:
- for darwin i have managed to get this to work, i was able to rewrite the AOF file and wait for the child process to terminate in the parent side. it was done using
syscall.Syscall(syscall.SYS_FORK, 0, 0, 0)and waiting using:syscall.Wait4(int(childPID), &ws, 0, nil) - however, for linux the fork syscall is
syscall.Syscall(syscall.SYS_CLONE, 0, 0, 0)and using the samesyscall.Wait4(int(childPID), &ws, 0, nil)(i didnt find any alternative for the wait call). Wait syscall always returns an errorno such processes to wait. the parent process successfully forks itself and returns the child's PID. when trying to wait on the returned PID the error mentioned is returned. for some reason it is unable to detect the child proc.
I would very much appreciate if you have some exprience, insights on this, since im out of ideas here and the Wait syscall is required to properly handle the BGREWRITE command.
Thanks
Hello @amityahav,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
@arpitbbhayani bro I'm literally posting questions above to you and you are not answering to any of them. I need help on this, please read them!
@amityahav I tried looking into it, but even I do not have any immediate idea about the fix. Also, I do not think the solution will be platform-specific, but I could be wrong.
If we are unsure of the solution or the approach, we can park this issue for now and pick this up, when this really comes back to bite us. But thank you so much for working on this. The insights you left in the comments did give some clarity.
So, moving this to icebox for now. Apologies and thank you.