flux-core
flux-core copied to clipboard
kvs: update threshold for val -> valref conversion
While working on something else I noticed this in kvstxn_unroll()
else if (treeobj_is_val (dir_entry)) {
json_t *val_data;
if (!(val_data = treeobj_get_data (dir_entry)))
return -1;
if (json_string_length (val_data) > BLOBREF_MAX_STRING_SIZE) { // <------- THIS LINE CAUGHT MY EYE
if ((ret = store_cache (kt,
val_data,
true,
ref,
sizeof (ref),
&entry)) < 0)
return -1;
if (ret) {
if (kvstxn_add_dirty_cache_entry (kt, entry) < 0)
return -1;
}
if (!(ktmp = treeobj_create_valref (ref)))
return -1;
if (json_object_iter_set_new (dir, iter, ktmp) < 0) {
json_decref (ktmp);
errno = ENOMEM;
return -1;
}
}
}
This code converts a "val" to a "valref" and stores the "val" data separately.
The threshold for converting a "val" to a "valref" is BLOBREF_MAX_STRING_SIZE which is the macro used for determining the maximum size of a blobref string (i.e. you'll see char blobref[BLOBREF_MAX_STRING_SIZE] in a bunch of places). It is only 72 bytes.
I initially thought this was a mistake, that there must have been some macro cut & paste error. However, it ends up this is not a mistake. This goes all the way back to this commit.
commit ffe9abdf8b2c5e13b605fbcf7b860d6ae1686614
Author: Jim Garlick <[email protected]>
Date: Wed Sep 7 14:22:24 2016 -0700
modules/kvs: commit large values to object store
When unrolling the rootdir copy in commit processing, if
any FILEVAL directory objects are encountered that serialize
to a string longer than SHA1_STRING_SIZE, initiate store to
local cache/content store and convert to a FILEREF.
The KVS module can do this in parallel with other commit processing,
whereas the library code does it synchronously.
This goes back to before the treeobj implementation. I do not recall the specifics of the pre-treeobj implementation, but I think this is from when data was written out more raw-like into the KVS. So if data was > the size of a BLOBREF, makes sense to only store the ref. Of course, we use treeobjs now.
A) the comparison should probably be > 72 bytes now. It should be based (minimally) on the max size of a treeobj valref and then backwards calculated to max treeobj val size.
- It could be a bit bigger than the calculated number. I'm not super convinced that this extra code is a net win for ohhh 256 bytes? 512 bytes? We didn't have as many benchmarks back then, so could be tested / tweaked.
B) we should probably use a different macro here. Using BLOBREF_MAX_STRING_SIZE sorta looks weird.
C) we should document this somewhere? Dunno if RFC11 is the right place.
just doing some backwards calculations, here is a valref with a hypothetical 72 char blobref
{"data":["ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRST"],"type":"valref","ver":1}
it is 108 chars (keeping in mind work on #6264 could alter this)
here's a val encoded with 62 chars of text that is also 108 chars
{"data":"YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5emFiY2RlZg","type":"val","ver":1}
so the threshold appears to be base64 encoded data of around 77, which I guess wasn't that far off from 72 anyways.
given potential future changes, including possibility of length for each valref entry, maybe a length of around 128 or something might be ok.
side note, sort of the same situation in restore command
if (size < BLOBREF_MAX_STRING_SIZE) {
if (!(treeobj = treeobj_create_val (buf, size)))
log_err_exit ("error creating val object for %s", path);
}
else {
flux_future_t *f;
const char *blobref;
if (!(f = content_store (h, buf, size, content_flags))
|| content_store_get_blobref (f, hash_type, &blobref) < 0)
log_msg_exit ("error storing blob for %s: %s",
path,
future_strerror (f, errno));
progress (h, 1, 0);
if (!(treeobj = treeobj_create_valref (blobref)))
log_err_exit ("error creating valref object for %s", path);
flux_future_destroy (f);
}
Well, it's just a heuristic for performance so maybe the answer is to add a comment to the places where it's used to explain that? As you suggest, maybe instead of testing against BLOBREF_MAX_STRING_SIZE in two places, we should add a convenience function to libkvs like bool treeobj_value_is_small (size_t size) that can be used instead?