lazyfs icon indicating copy to clipboard operation
lazyfs copied to clipboard

Allow injecting faults via a unix socket

Open serathius opened this issue 2 years ago • 3 comments

Hey, thanks for adding "reorder" and "split_write" failures and filing https://github.com/etcd-io/etcd/issues/16596 bug. I'm really interested in using those in etcd testing.

However, the current implementation of fault injection is not compatible with how etcd is tested. We need to control when and how the injection happens so we can verify the etc state before and after.

I would love if LazyFS provided a command to unix socket (similar to clear cache) that allows user to invoke faults like "reorder" on the last write.

Issue to track this on etcd side https://github.com/etcd-io/etcd/issues/16597

Thanks for all your great work!

serathius avatar Sep 15 '23 11:09 serathius

Hello, I explained with some detail the reason why these two faults are currently not injected through a FIFO and some possibilities that could be useful for etcd testing in this comment https://github.com/etcd-io/etcd/issues/16597#issuecomment-1727674614

Thanks for the feedback!

mj-ramos avatar Sep 20 '23 13:09 mj-ramos

Hi, @mj-ramos

Recently, I use dm-flakey https://github.com/fuweid/go-dmflakey to simulate power failure. But the dm-flakey device in BIO layer doesn't distinguish the content between file's and filesystem's metadata. The drop_writes is easy to break the filesystem. I created a reproducer test case for boltdb. It's tricky because the test only uses fdatasync. So I am thinking what if lazyfs can provide the similar function at file level.

  • drop_writes: it can be trigged by fifo. when it's enable, all the write IO should be redirected to temporary file so that it can ensure all the read is functioning. after cache clean, the temporary file should be deleted so that it can simulate the data loss after power failure. (It's more like about split_write, but it doesn't need to set which write syscall should be split)

What do you think about it?

fuweid avatar Nov 18 '23 10:11 fuweid

Hi, I apologize for the late reply. I've been quite busy lately. Thank you for your interest in LazyFS!

Certainly, LazyFS can accomplish that. If I understand correctly, the feature you are suggesting appears to be somewhat similar to the clear cache command, which removes all the contents of the cache at a certain point. With this command, reads are not compromised; however, if neither fsync nor fdatasync is issued, the writes will be dropped. It seems to me that you specifically want to discard writes for a particular file? Is this correct?

mj-ramos avatar Dec 11 '23 15:12 mj-ramos