UnifyFS icon indicating copy to clipboard operation
UnifyFS copied to clipboard

avoid unnecessary fsync when sync'ing extents with server

Open adammoody opened this issue 4 years ago • 5 comments

In unifyfs_sync_extents, the client fsyncs all data to disk before it calls the rpc to sync extents with the server:

https://github.com/LLNL/UnifyFS/blob/fddedf29335cf3359c51be47017616a61a1f51cd/client/src/unifyfs-fixed.c#L192

We sync data to disk by calling the logio_sync call here:

https://github.com/LLNL/UnifyFS/blob/fddedf29335cf3359c51be47017616a61a1f51cd/client/src/unifyfs-fixed.c#L226

Then we tell the server to copy extents from the client extent buffer with an rpc call here:

https://github.com/LLNL/UnifyFS/blob/fddedf29335cf3359c51be47017616a61a1f51cd/client/src/unifyfs-fixed.c#L235

There are two main reasons we sync extents with the server:

  1. the application has written enough extents that we've exhausted the extent buffer, in which case, we flush that buffer to the server so that the client can handle more writes
  2. the application has explicitly called something like fsync() or fflush()

In case 1), when we're using a POSIX filesystem as our backing store, which will be the case for tmpfs and the file system on most SSDs, then we can avoid syncing data to disk completely. POSIX ensures that the server process will read the correct data even though it was written by another process, without requiring an fsync.

I think we could similarly avoid fsync'ing data to disk on calls to close(). We currently sync on close, but that's mainly to send extents to the server, not really to store data to disk.

Even in case 2), we could likely sync data to disk after we call the rpc to expose the extents to the server. That might allow for some overlap in letting the server fetch the extents while data is being sync'd to disk so that those two operations can happen in parallel. It may require adding a second rpc to actually allow overlap.

The fix for case 1) to avoid calling fsync at all should speed up UnifyFS performance for applications that write lots of extents and rarely explicitly call fsync.

To support this, we likely need to add another argument to unifyfs_sync_extents that tells the function whether an fsync is needed. Then all callers need to be updated to add that argument in their call. The implementation could use the new unifyfs client library, which has split those into separate calls.

adammoody avatar Jul 12 '21 20:07 adammoody

Addressed by #651?

CamStan avatar Jul 26 '21 19:07 CamStan

@MichaelBrim , here is the write up on what I was thinking. This is saying the client only needs to call fsync on logio when the user explicitly calls fsync (or something equivalent). If the user does not call fsync, then we can avoid calling fsync in the client. This should help in apps that write lots of data and/or extents but never call fsync.

Additionally, when we do have to call fsync, we could flush the extents to the server in parallel with the fsync. That would be a smaller improvement, and it would likely require us to split the extent flush into two parts. That is, change what we had:

fsync logio
rpc to tell server to copy extents

to something like:

rpc to tell server to start copying extents
fsync logio (while server is copying extents in background)
wait for server to tell us it's done copying the extents

Or we could spawn a thread on the client to do the fsync call or something.

adammoody avatar Jul 26 '21 20:07 adammoody

The other item we might consider here is to provide a setting that lets a user disable our fsync of logio. That would correspond to the user saying something like "even though my app calls fsync(), please act like it didn't".

That would be an easy way for a user to get improved performance if they happen to be running an app that has a bunch of fsync calls, but they don't actually care about data being sync'd to disk. Flip a switch to get faster performance without having to modify the code and rebuild the app.

adammoody avatar Jul 26 '21 20:07 adammoody

I can easily add the user configuration option. Splitting the sync-extents code into two parts is a little more difficult given the current abstraction at the fid layer. However, it's fairly easy to have the logio fsync() run in a ULT to make sure that doesn't slow the client.

MichaelBrim avatar Jul 27 '21 17:07 MichaelBrim

Thanks, @MichaelBrim . I think coding the logio fsync to happen in the background with the extent flush will be much less significant performance-wise than simply avoiding the logio fsync when it's not required. We could tackle the background fsync as a separate item.

adammoody avatar Jul 27 '21 18:07 adammoody

@adammoody, do you know if this was resolved?

CamStan avatar Oct 31 '23 22:10 CamStan

Yes, I think this is resolved. We can close it out.

adammoody avatar Nov 01 '23 00:11 adammoody