dm-writeboost icon indicating copy to clipboard operation
dm-writeboost copied to clipboard

tested against power failure?

Open disaster123 opened this issue 10 years ago • 13 comments

How well is dm-writeboost tested against power failures? If now i would like to start doing so.

disaster123 avatar Jul 13 '15 18:07 disaster123

First of all, I am doing tests on https://github.com/akiradeveloper/device-mapper-test-suite

Sudden power failure is very difficult to test so I haven't. But disk corruption can be emulated by dm-flakey I had test case locally, but it's not in upstream because the code was too immature.

In real sense, dm-writeboost is sub-optimal for the power failure issue. I used the word theory because of it. Cache hit case may writeback data to backing store so the said principle is only partially kept.

But yes, I can. The copy_bio_payload I implemented in 2.0.2 copies bio segments to buffer. For the principle, I need to implement the reverse one. Why I keep it sub-optimal is that power failure is very less frequent but the code will be bit complicated.

I will implement it in near future if you really start to test power failure case.

akiradeveloper avatar Jul 13 '15 23:07 akiradeveloper

Well, I misunderstand your question.

The writeback stuffs are irrelevant for power failure.

dm-writeboost's designed to be robust for power failure issues written in the paper below: https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

Typically, partial write or bit corruption should be considered and the current (v2.0.3) implementation is good enough for these failures. But testing is, as I said, only done with dm-flakey to emulate bit corruption.

dm-writeboost ignores logs whose checksum is inconsistent. Partial write and bit corruption due to power failure are typically the reason for this. So, please use the latest version for your test.

akiradeveloper avatar Jul 14 '15 00:07 akiradeveloper

THanks will start doing some tests in the next weeks.

disaster123 avatar Jul 14 '15 05:07 disaster123

Great. What kind of tests?

akiradeveloper avatar Jul 14 '15 05:07 akiradeveloper

We have an API to some extermal power managemt systems so i can stress the FS with bonnie++ automatically pull power and see the consistency of the FS afterwards. This can be done 100 or 100 times.

disaster123 avatar Jul 14 '15 07:07 disaster123

Great! Don't clear the caches because it's not successful shutdown. You need to replay the logs after reboot.

akiradeveloper avatar Jul 14 '15 07:07 akiradeveloper

Yes sure

disaster123 avatar Jul 14 '15 11:07 disaster123

@akiradeveloper thanks for the detail explanation. if I understand this correctly, under a power failure:

  • This will only affect the write-back policy
  • Since you're trying to aggregate io requests in ram buffer(512KB), those un-flushed data will be lost
  • The corresponding data blocks on SSD(out of date) can be reused when the system restarted

Is my understanding here right?

zhouyuan avatar May 16 '16 06:05 zhouyuan

@zhouyuan Yes for all. As for the second question, losing un-flushed data is ok because client of block device should submit barrier request (bio flagged with REQ_FLUSH) to ensure that the preceding data are persistent. Writeboost guarantees this but may lose the data after the latest barrier.

Thank you for the questions.

akiradeveloper avatar May 16 '16 09:05 akiradeveloper

@akiradeveloper thanks a lot! One more question: if the application sends a flush request on each 4KB write, in the flush-job it would be at per-write(4KB) level or 512KB level?

zhouyuan avatar May 16 '16 14:05 zhouyuan

@zhouyuan It's the worst case scenario. Writeboost may flush the log for each 4KB write. But the log size is 8KB (4KB header + 4KB data), not 512KB.

I call it partial log and you can see the counting in <nr_partial_flushed> of `dmsetup status". (please see the doc)

FYI, there is an optimization for the flush request handling. Because there should be threads other than the application sending writes to the device, there is a chance to fill the 512KB ram buffer in a short moment after the flush request is sent. Writeboost defers ack to the flush request a bit to wait for other I/Os.

The overhead of partial logs can be reduced if the cache device itself is enough responsive for the flush request. For example, some enterprise-level SSDs are BBU equipped or use non-volatile memory for the internal buffer. These types of SSDs doesn't need to flush the internal buffer to NAND medium per flush request and response quickly to the flush requests.

akiradeveloper avatar May 17 '16 02:05 akiradeveloper

@akiradeveloper Thanks for the detail answer! It helps a lot

zhouyuan avatar May 17 '16 06:05 zhouyuan

did those power-failure test lead to any results?

cHolzberger avatar Oct 07 '18 07:10 cHolzberger