libpmemobj: Allocated objects not aligned to cache boundaries
ISSUE: libpmemobj: Allocated objects not aligned to cache boundaries
The man page for pmemobj_alloc states the following:
The allocations are always aligned to a cache-line boundary.
I am working with software that utilizes libpmemobj to provide a recoverable persistent memory heap, but utilizes much lower-level interfaces for ensuring persistence of writes to this heap. In this software, the guarantee given above is assumed to be true. However, in reality, libpmemobj does not appear to enact this guarantee, and therefore the software in question is vulnerable to a very subtle persistence bug (demonstrated below by example).
Environment Information
- PMDK package version(s): 1.8
- OS(es) version(s): Linux (Ubuntu) x86_64
- ndctl version(s): N/A
- kernel version(s): 4.15.0-91-generic
- compiler, libraries, packaging and other related tools version(s): cc 7.5.0
Please provide a reproduction of the bug:
The following test case, when run with pmemcheck, illustrates the potential issue of relying on cache alignment:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <libpmem.h>
#include <libpmemobj.h>
POBJ_LAYOUT_BEGIN(layout);
POBJ_LAYOUT_ROOT(layout, struct my_root);
POBJ_LAYOUT_TOID(layout, struct my_item);
POBJ_LAYOUT_END(layout);
// Root points to a single item allocated with pmemobj_alloc.
struct my_root {
TOID(struct my_item) item;
};
// Item is one cache line in size and has two fields separated by 48 bytes.
struct my_item {
uint64_t a;
uint8_t _pad[48];
uint64_t b;
};
static int item_construct(PMEMobjpool *pop, void *ptr, void *arg) {
struct my_item *item = ptr;
item->a = 0;
item->b = 0;
pmemobj_persist(pop, item, sizeof *item);
return 0;
}
int main(int argc, char **argv) {
PMEMobjpool *pop;
const char *path;
if (argc < 2) {
printf("usage: bad-align <pool>\n");
return 1;
}
path = argv[1];
if (access(path, F_OK) != 0) {
if ((pop = pmemobj_create(path, POBJ_LAYOUT_NAME(layout),
PMEMOBJ_MIN_POOL, 0666)) == NULL) {
perror("failed to create pool\n");
return -1;
}
printf("Pool created\n");
} else {
if ((pop = pmemobj_open(path,
POBJ_LAYOUT_NAME(layout))) == NULL) {
perror("failed to open pool\n");
return -1;
}
printf("Using existing pool\n");
}
TOID(struct my_root) root = POBJ_ROOT(pop, struct my_root);
TOID(struct my_item) item = D_RO(root)->item;
if (D_RO(item) != NULL) {
printf("Freeing existing item first.\n");
POBJ_FREE(&D_RW(root)->item);
}
// Fail-safely allocate item with pmemobj_alloc
printf("Allocating new item\n");
POBJ_NEW(pop, &D_RW(root)->item, struct my_item, item_construct, NULL);
item = D_RO(root)->item;
printf("Done. Address = %p, (mod 64 = %lu)\n",
D_RO(item), (uintptr_t)D_RO(item) % 64);
// Modify item
printf("Modifying item\n");
uint64_t *a = &D_RW(item)->a;
uint64_t *b = &D_RW(item)->b;
printf(" &item->a = %p (div 64 = %lu)\n", a, (uintptr_t)a / 64);
printf(" &item->b = %p (div 64 = %lu)\n", b, (uintptr_t)b / 64);
*a = 1;
*b = 2;
// Flush item->b.
// ** because item should be cache-aligned, will also flush item->a **
printf("Persisting item->b\n");
pmem_flush(&D_RO(item)->b, sizeof(uint64_t));
pmem_drain();
printf("Done\n");
return 0;
}
How often bug is revealed: (always, often, rare): always
Actual behavior:
Allocated pmem objects are always offset 16 bytes into a cache line. This causes item->a and item->b to reside in different cache lines, and therefore item->a is not persisted.
$ PMEM_IS_PMEM_FORCE=1 valgrind --tool=pmemcheck ./bad-align pool
==30651== pmemcheck-1.0, a simple persistent store checker
==30651== Copyright (c) 2014-2020, Intel Corporation
==30651== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==30651== Command: ./bad-align pool
==30651==
==30651== brk segment overflow in thread #1: can't grow to 0x4a49000
==30651== (see section Limitations in user manual)
==30651== NOTE: further instances of this message will not be shown
Using existing pool
Freeing existing item first.
Allocating new item
Done. Address = 0x5dc05d0, (mod 64 = 16)
Modifying item
&item->a = 0x5dc05d0 (div 64 = 1536023)
&item->b = 0x5dc0608 (div 64 = 1536024)
Persisting item->b
Done
==30651==
==30651== Number of stores not made persistent: 1
==30651== Stores not made persistent properly:
==30651== [0] at 0x108EE3: main (bad-align.c:100)
==30651== Address: 0x5dc05d0 size: 8 state: DIRTY
==30651== Total memory not made persistent: 8
==30651== ERROR SUMMARY: 1 errors
Expected behavior:
Based on the documentation, I expect item to be aligned to a cache boundary, and I expect that flushing item->b will also flush item->a, since they would be in the same cache line.
Details
After a brief safari through the libpmemobj code, I think the problem may be related to the compact header information for run memory_blocks:
/*
* block_get_user_data -- returns pointer to the data of a block
*/
static void *
block_get_user_data(const struct memory_block *m)
{
return (char *)m->m_ops->get_real_data(m) +
header_type_to_size[m->header_type];
}
static const size_t header_type_to_size[MAX_HEADER_TYPES] = {
sizeof(struct allocation_header_legacy),
sizeof(struct allocation_header_compact),
0
};
struct allocation_header_legacy {
uint8_t unused[8];
uint64_t size;
uint8_t unused2[32];
uint64_t root_size;
uint64_t type_num;
}; // 64 bytes
struct allocation_header_compact {
uint64_t size;
uint64_t extra;
}; // 16 bytes
Additional information about Priority and Help Requested:
Are you willing to submit a pull request with a proposed change? (Yes, No) If given direction on how to fix it, then certainly. Currently, I have no understanding of deeply rooted this issue is, so I will not commit to anything right now.
Requested priority: (Showstopper, High, Medium, Low) Medium to High
The allocated raw objects are technically cache line aligned, but the effective alignment of user data is offset by the object's header. This "soft alignment" allows you to reason about false-sharing and safety of read-only cache line aligned operations (e.g., aligned SIMD instructions). The man page doesn't explain this very well, and we definitely should address that. But we do not anticipate any changes to the current effective alignment of regular allocations.
Enforcing effective cache line alignment for user data would have an impact on the space efficiency of the allocator, which is why the default behavior is not to do it.
Code that does need specific effective alignment can use custom allocation classes, see pmemobj_ctl_get(3)
(https://pmem.io/pmdk/manpages/linux/master/libpmemobj/pmemobj_ctl_get.3) for more details.
Thank you for the clarification. I agree, the documentation should be improved to clarify these things, to avoid future programmers from making these same assumptions.
I encounter this problem too. I found the address allocated by pmemobj_alloc always have 16bytes offset off the boundary of cacheline. Now I know it's the compact header created by pmemobj_alloc.
I try to use pmemobj_ctl_set as the workaround. It works, but it also have some problems: the allocations of memory with different size need different kind of alloc class, it seems to be resolved by default pmemobj_alloc. But if I use pmemobj_ctl_set and pmemobj_xalloc, it can't adapt to the various size of allocation and seems to be error-prone.
Is there any method to use pmemobj_xalloc to get the address aligned to cacheline and adapt the different size of allocation?
I'm afraid not - the default allocation classes all use the compact header and do not use alignment. The first implementation of the allocation classes interface did allow the application to effectively replace the default classes, but the interface was very cumbersome and we've decided against exposing it. It presented a lot of challenges from the correctness and backward-compatibility point of view. Please feel free to create a new issue with a feature request. It would also be useful if you could suggest how the API should look like in your opinion.