flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

kvs: update threshold for val -> valref conversion

Open chu11 opened this issue 8 months ago • 3 comments

While working on something else I noticed this in kvstxn_unroll()

        else if (treeobj_is_val (dir_entry)) {
            json_t *val_data;

            if (!(val_data = treeobj_get_data (dir_entry)))
                return -1;
            if (json_string_length (val_data) > BLOBREF_MAX_STRING_SIZE) {  // <------- THIS LINE CAUGHT MY EYE
                if ((ret = store_cache (kt,
                                        val_data,
                                        true,
                                        ref,
                                        sizeof (ref),
                                        &entry)) < 0)
                    return -1;
                if (ret) {
                    if (kvstxn_add_dirty_cache_entry (kt, entry) < 0)
                        return -1;
		}
                if (!(ktmp = treeobj_create_valref (ref)))
                    return -1;
                if (json_object_iter_set_new (dir, iter, ktmp) < 0) {
                    json_decref (ktmp);
                    errno = ENOMEM;
                    return -1;
                }
            }
        }

This code converts a "val" to a "valref" and stores the "val" data separately.

The threshold for converting a "val" to a "valref" is BLOBREF_MAX_STRING_SIZE which is the macro used for determining the maximum size of a blobref string (i.e. you'll see char blobref[BLOBREF_MAX_STRING_SIZE] in a bunch of places). It is only 72 bytes.

I initially thought this was a mistake, that there must have been some macro cut & paste error. However, it ends up this is not a mistake. This goes all the way back to this commit.

commit ffe9abdf8b2c5e13b605fbcf7b860d6ae1686614
Author: Jim Garlick <[email protected]>
Date:   Wed Sep 7 14:22:24 2016 -0700

    modules/kvs: commit large values to object store
    
    When unrolling the rootdir copy in commit processing, if
    any FILEVAL directory objects are encountered that serialize
    to a string longer than SHA1_STRING_SIZE, initiate store to
    local cache/content store and convert to a FILEREF.
    
    The KVS module can do this in parallel with other commit processing,
    whereas the library code does it synchronously.

This goes back to before the treeobj implementation. I do not recall the specifics of the pre-treeobj implementation, but I think this is from when data was written out more raw-like into the KVS. So if data was > the size of a BLOBREF, makes sense to only store the ref. Of course, we use treeobjs now.

A) the comparison should probably be > 72 bytes now. It should be based (minimally) on the max size of a treeobj valref and then backwards calculated to max treeobj val size.

  • It could be a bit bigger than the calculated number. I'm not super convinced that this extra code is a net win for ohhh 256 bytes? 512 bytes? We didn't have as many benchmarks back then, so could be tested / tweaked.

B) we should probably use a different macro here. Using BLOBREF_MAX_STRING_SIZE sorta looks weird.

C) we should document this somewhere? Dunno if RFC11 is the right place.

chu11 avatar Mar 28 '25 06:03 chu11

just doing some backwards calculations, here is a valref with a hypothetical 72 char blobref

{"data":["ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRST"],"type":"valref","ver":1}

it is 108 chars (keeping in mind work on #6264 could alter this)

here's a val encoded with 62 chars of text that is also 108 chars

{"data":"YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5emFiY2RlZg","type":"val","ver":1}

so the threshold appears to be base64 encoded data of around 77, which I guess wasn't that far off from 72 anyways.

given potential future changes, including possibility of length for each valref entry, maybe a length of around 128 or something might be ok.

chu11 avatar Mar 28 '25 14:03 chu11

side note, sort of the same situation in restore command

    if (size < BLOBREF_MAX_STRING_SIZE) {
        if (!(treeobj = treeobj_create_val (buf, size)))
            log_err_exit ("error creating val object for %s", path);
    }
    else {
        flux_future_t *f;
        const char *blobref;

        if (!(f = content_store (h, buf, size, content_flags))
	    || content_store_get_blobref (f, hash_type, &blobref) < 0)
            log_msg_exit ("error storing blob for %s: %s",
                          path,
                          future_strerror (f, errno));
	progress (h, 1, 0);
        if (!(treeobj = treeobj_create_valref (blobref)))
            log_err_exit ("error creating valref object for %s", path);
        flux_future_destroy (f);
    }

chu11 avatar Mar 29 '25 00:03 chu11

Well, it's just a heuristic for performance so maybe the answer is to add a comment to the places where it's used to explain that? As you suggest, maybe instead of testing against BLOBREF_MAX_STRING_SIZE in two places, we should add a convenience function to libkvs like bool treeobj_value_is_small (size_t size) that can be used instead?

garlick avatar Apr 07 '25 19:04 garlick