linux-nova icon indicating copy to clipboard operation
linux-nova copied to clipboard

memcpy_to_pmem_nocache does not persist unaligned bytes

Open pcworld opened this issue 3 years ago • 4 comments

NOVA's function memcpy_to_pmem_nocache is commonly used in NOVA, and is defined as follows: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/fs/nova/nova.h#L259-L267

The naming seems to imply that this function guarantees the cache to be bypassed and thus the data to be persisted without need of explicit flushing. And indeed, this assumption is followed in at least one place of NOVA, as shown later.

Following the definition of __copy_from_user_inatomic_nocache (on x86_64) to __copy_user_nocache, we find the following documentation: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/arch/x86/lib/copy_user_64.S#L196-L205

Following the assembly code, we find that:

  • If the size is < 8 bytes, size must be 0 or 4 bytes and aligned at 4 bytes to ensure that everything is copied using non-temporal instructions.
  • If the size is >= 8 bytes, the destination start must be aligned at 8 bytes (or else the ALIGN_DESTINATION macro will perform cached copy) and the size must be a multiple of 4 bytes to ensure that everything is copied using non-temporal instructions.

Some places in NOVA use memcpy_to_pmem_nocache but still perform explicit flushing, e.g. through the nova_update_entry_csum function, which implicitly flushes the buffer. These cases could be considered performance bugs, but do not affect crash consistency.

However, at least one place exists in which use of memcpy_to_pmem_nocache currently leads to a violation of crash consistency guarantees (I have not thoroughly looked for other cases so there might be more): nova_cow_file_write calls do_nova_cow_file_write which calls memcpy_to_pmem_nocache to copy file contents to persistent memory and does not perform further flushing.

Consider this shell script:

mount -t NOVA -o init /dev/pmem0 /mnt
echo HelloWorld > /mnt/myfile
sync

The string HelloWor (8 bytes) is copied using movnti, however the rest of the bytes l, d, \n are copied using a temporal mov instruction. Thus, the last three bytes are not guaranteed to be persisted. If the system crashes after the sync system call succeeded but the last three bytes have not yet been persisted, recovering the file system and reading /mnt/myfile will only yield HelloWor\0\0\0 as the file content, which violates the guarantees of the sync system call.

Suggested fix

In NOVA's memcpy_to_pmem_nocache, after the call to __copy_from_user_inatomic_nocache: If the function arguments do not fulfill the alignment requirements outlined above, explicitly call nova_flush_buffer with the bytes at the beginning and/or the end of the buffer.

pcworld avatar Jun 23 '21 14:06 pcworld

Thanks for the findings. Care to send a PR?

Andiry avatar Jun 29 '21 18:06 Andiry

I'm short on time, but there is the following workaround (it performs unnecessary flushes, but is at least correct hopefully):

--- a/fs/nova/nova.h
+++ b/fs/nova/nova.h
@@ -263,6 +263,9 @@ static inline int memcpy_to_pmem_nocache(void *dst, const void *src,
 
        ret = __copy_from_user_inatomic_nocache(dst, src, size);
 
+       nova_flush_buffer(dst, size, 1);
+
        return ret;
 }

A better fix would be to only call nova_flush_buffer on beginning/end where alignment does not match. The code for that should be simple, but needs to be carefully written and tested to be completely sure it works in all cases, which is why I haven't submitted a PR yet.

pcworld avatar Oct 20 '21 17:10 pcworld

I think the proposed fix is too heavy. I think for most of the time we do meet the alignment requirements, right? Perhaps performing nova_flush_buffer here with a check of alignments?

Andiry avatar Dec 07 '21 04:12 Andiry

I think for most of the time we do meet the alignment requirements, right?

I have also observed other cases where this bug hits: Symlink creation (target name), and also renaming files to long file names (longer than cache line size) or creating new files with long names.

I have found that PMFS already specifically handles this issue: https://github.com/linux-pmfs/pmfs/blob/28aaf606ff076080a72174e9f5a7be13979a8b97/fs/pmfs/xip.c#L81-L84 Perhaps pmfs_flush_edge_cachelines can just be reused in NOVA.

pcworld avatar Dec 07 '21 19:12 pcworld