linux-nova
linux-nova copied to clipboard
memcpy_to_pmem_nocache does not persist unaligned bytes
NOVA's function memcpy_to_pmem_nocache
is commonly used in NOVA, and is defined as follows: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/fs/nova/nova.h#L259-L267
The naming seems to imply that this function guarantees the cache to be bypassed and thus the data to be persisted without need of explicit flushing. And indeed, this assumption is followed in at least one place of NOVA, as shown later.
Following the definition of __copy_from_user_inatomic_nocache
(on x86_64) to __copy_user_nocache
, we find the following documentation: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/arch/x86/lib/copy_user_64.S#L196-L205
Following the assembly code, we find that:
- If the size is < 8 bytes, size must be 0 or 4 bytes and aligned at 4 bytes to ensure that everything is copied using non-temporal instructions.
- If the size is >= 8 bytes, the destination start must be aligned at 8 bytes (or else the
ALIGN_DESTINATION
macro will perform cached copy) and the size must be a multiple of 4 bytes to ensure that everything is copied using non-temporal instructions.
Some places in NOVA use memcpy_to_pmem_nocache
but still perform explicit flushing, e.g. through the nova_update_entry_csum
function, which implicitly flushes the buffer. These cases could be considered performance bugs, but do not affect crash consistency.
However, at least one place exists in which use of memcpy_to_pmem_nocache
currently leads to a violation of crash consistency guarantees (I have not thoroughly looked for other cases so there might be more):
nova_cow_file_write
calls do_nova_cow_file_write
which calls memcpy_to_pmem_nocache
to copy file contents to persistent memory and does not perform further flushing.
Consider this shell script:
mount -t NOVA -o init /dev/pmem0 /mnt
echo HelloWorld > /mnt/myfile
sync
The string HelloWor
(8 bytes) is copied using movnti
, however the rest of the bytes l
, d
, \n
are copied using a temporal mov
instruction. Thus, the last three bytes are not guaranteed to be persisted. If the system crashes after the sync
system call succeeded but the last three bytes have not yet been persisted, recovering the file system and reading /mnt/myfile will only yield HelloWor\0\0\0
as the file content, which violates the guarantees of the sync
system call.
Suggested fix
In NOVA's memcpy_to_pmem_nocache
, after the call to __copy_from_user_inatomic_nocache
: If the function arguments do not fulfill the alignment requirements outlined above, explicitly call nova_flush_buffer
with the bytes at the beginning and/or the end of the buffer.
Thanks for the findings. Care to send a PR?
I'm short on time, but there is the following workaround (it performs unnecessary flushes, but is at least correct hopefully):
--- a/fs/nova/nova.h
+++ b/fs/nova/nova.h
@@ -263,6 +263,9 @@ static inline int memcpy_to_pmem_nocache(void *dst, const void *src,
ret = __copy_from_user_inatomic_nocache(dst, src, size);
+ nova_flush_buffer(dst, size, 1);
+
return ret;
}
A better fix would be to only call nova_flush_buffer
on beginning/end where alignment does not match. The code for that should be simple, but needs to be carefully written and tested to be completely sure it works in all cases, which is why I haven't submitted a PR yet.
I think the proposed fix is too heavy. I think for most of the time we do meet the alignment requirements, right? Perhaps performing nova_flush_buffer here with a check of alignments?
I think for most of the time we do meet the alignment requirements, right?
I have also observed other cases where this bug hits: Symlink creation (target name), and also renaming files to long file names (longer than cache line size) or creating new files with long names.
I have found that PMFS already specifically handles this issue: https://github.com/linux-pmfs/pmfs/blob/28aaf606ff076080a72174e9f5a7be13979a8b97/fs/pmfs/xip.c#L81-L84
Perhaps pmfs_flush_edge_cachelines
can just be reused in NOVA.