rDSN icon indicating copy to clipboard operation
rDSN copied to clipboard

shared log not flush to disk/ssd before return set result to client

Open qinzuoyan opened this issue 8 years ago • 9 comments

in mutation_log_shared::append(), the shared log is not flushed (fsync on linux) before commit, which may cause data lost when machine restart under strong consistency semantics.

qinzuoyan avatar Jul 19 '16 04:07 qinzuoyan

In addition, fsync is a block io operation, which is unfriendly to rDSN, so should we consider "aio_fsync" and wrap it as a api of disk_aio?

shengofsun avatar Jul 19 '16 09:07 shengofsun

There is a currently a flush method in aio_provider. Can you guys do a survey to see whether flush and fsync are the same thing or not? aio_fsync is great - I don't realize there is an async version on Linux:)

imzhenyu avatar Jul 19 '16 13:07 imzhenyu

@imzhenyu, in aio_provider of linux, the flush method calls fsync, but we don't see anywhere that calls aio_provider's flush during the prepare of a mutation log.

shengofsun avatar Jul 19 '16 13:07 shengofsun

Thanks, @shengofsun. There are two ways to ensure the data is really pushed to disk: fsync and use direct io. I guess in our case using direct IO is easier? (with a direct io flag when opening the log). It is more complicated on windows though as it requires aligned memory.

imzhenyu avatar Jul 19 '16 14:07 imzhenyu

@imzhenyu did you refer to the O_DIRECT? Behavior of O_DIRECT in Linux is not hardware/filesystem-independent, O_SYNC/O_DSYNC should be the proper choice for us. See this Three issues: (1) is there a corresponding option in windows? (2) there are noticeable write amplification when syncing with hardware. I guess we should need a page cache in our disk engine. (3) I'm not sure whether linux AIO works fine with O_SYNC/O_DSYNC flags.

shengofsun avatar Jul 20 '16 01:07 shengofsun

Seems O_DIRECT + O_SYNC is good enough. But unfortunately, the buffer should be the same as the situation under Windows that it needs to be aligned. For your questions:

  • yes
  • if we need to do hard sync, we anyway will have write amplification issue.
  • i'm not sure neither - we may try and see.

imzhenyu avatar Jul 20 '16 04:07 imzhenyu

O_SYNC is enough for hard sync, but as far as I can see, O_DIRECT is not necessary. And if we write file with a buffer of page size, the write amplification issue should reduce. Of course, this also need to test.

shengofsun avatar Jul 20 '16 05:07 shengofsun

so the resolution is to set O_SYNC flag when dsn_file_open() in log_file::create_write()?

qinzuoyan avatar Jul 20 '16 06:07 qinzuoyan

Let's try and see.

imzhenyu avatar Jul 20 '16 07:07 imzhenyu