ucode icon indicating copy to clipboard operation
ucode copied to clipboard

Add support for keeping value references in resources

Open nbd168 opened this issue 11 months ago • 5 comments

This adds a gc_mark function pointer to the resource type. It allows keeping values alive that are referenced from within resource structs. Use this gc_mark support to drop object_registry from uloop and to simplify the code.

nbd168 avatar Feb 05 '25 12:02 nbd168

The object registry will still be needed though, just not for tracking internal pointers anymore.

But suppose you do something like ubus.defer(…) or uloop.handle(…) without storing the return value in a variable, the returned resource would be freed immediately so this would essentially move the burden of retaining resource references to user code, e.g. by holding alive resources in an array or dict until they‘re ready to dispose.

jow- avatar Feb 06 '25 08:02 jow-

@jow- you're right, I missed that part. How about we combine the approach from this PR with your earlier suggestion for keeping resources alive using your proposed ucv_resource_set_external?

nbd168 avatar Feb 06 '25 08:02 nbd168

I see the need but conceptually I am not happy with exposing gc details like that. Need to think some more about it, but my gut feeling so far is that resources should provide a generic data structure which the vm/gc can iterate instead of introducing resource specific arbitrary logic callbacks solely dealing with forwarding marks. Having such a blackboxed marking logic will make it impossible in the future to optimize the current recursive marking logic into an iterative one using heap stacks to avoid stack overflows.

The most natural solution would be giving uc_resource_t a new uc_value_t *uvdata pointer which is then recursively followed like any other plain uv type.

Pro: simple for the gc marker to follow, no separate logic needed. Con: slightly more annoying to use from C (e.g. need to put a scratch object or array there, then stuff related values in there and obtaining them using ucv_object_get(), ucv_array_get() instead of simple pointer assignments)

A varied approach would be appending a VLA to the end of uc_resource_t like that:

typedef struct {
	uc_value_t header;
	uc_resource_type_t *type;
	void *data;
        size_t uvcount;
        uc_value_t *uvdata[];
} uc_resource_t;

Pro: easier to use from C - e.g. can use myresptr->uvdata[0] to get access to own values, still straightforward for the gc marker to follow Con: still slightly less elegant than direct pointer access

However I think the C side aesthetic issues could be addressed with appropriate data accessor macros or inline functions.

jow- avatar Feb 10 '25 10:02 jow-

Makes sense. I will think about it and create a new draft in this PR.

nbd168 avatar Feb 10 '25 10:02 nbd168

FYI, I decided to drop the VLA at the end of the struct, because it was breaking the struct module, which embeds uc_resource_t in another struct.

nbd168 avatar Feb 10 '25 16:02 nbd168

@jow- ping?

nbd168 avatar Feb 17 '25 19:02 nbd168

Seems I failed to actually submit my drafted review - sorry.

This already looks quite good - but I would really love to eliminate either of the data or uvcount members and add proper macros or inline functions to get/set the data value of a resource.

My not fully fleshed out idea is:

  • Use the u16_or_constant bit (which we eventually should rename since it bears different meanings in different contexts) to signifiy that void *data is actually an extended header
  • Use the three least significant bits for the following purposes:
    • bit 0: when true, signifies that the resource is exempted from garbage collection
    • bit 1: reserved
    • bits 2..15: number of subsequent uc_value_t pointers [0..16384]
    • bits 16..31: size of extra data [0..65535]
  • When the u16_or_constant uv header bit is false, data is an ordinary memory address
  • Otherwise data should be treated as:
union {
  void *memory;
  struct {
    uint32_t no_gc:1;
    uint32_t reserved:1;
    uint32_t uvcount:14;
    uint32_t datasize:16;
  } exthdr;
} data

Then we should introduce void *ucv_resource_data_get(uc_value_t *) and void ucv_resource_data_set(uc_value_t *, void *) which do the following:

  • Check res->header.u16_or_constant, when true, return (void *)((char *)res + sizeof(uc_resource_t) + sizeof(uc_value_t *) * res->data.exthdr.uvcount) otherwise return res->data.memory
  • Similar for setting data, check for extended header, when true, *(void **)((char *)res + sizeof(uc_resource_t) + sizeof(uc_value_t *) * res->data.exthdr.uvcount) = value otherwise res->data.memory = value

And we probably need a new allocator, so ucv_resource_create_ex(uc_vm_t *vm, const char *typename, bool no_gc, size_t uvcount, size_t datasize), which would:

  • assert(uvcount <= 16383)
  • assert(datasize <= 65535)
  • res = xalloc(sizeof(uc_resource_t) + uvcount * sizeof(uc_value_t *) + datasize)
  • res->header.u16_or_constant = true; // Has extended header
  • res->data.exthdr.no_gc = no_gc; res->data.exthdr.uvcount = uvcount; res->data.exthdr.datasize = datasize;

And maybe ucv_resource_data_set() should be actually two functions, one ucv_resource_data_setptr(uc_value_t *uv, void *val) which assumes a void * sized data area which is treated as void pointer and ucv_resource_data_setmem(uc_value_t *uv, size_t sz, void *data) which memcopies the given amount of bytes into the data area.

The thoughts behind this are:

  • I noticed that I increasingly tend to use resource objects with trailing payload data to reduce the number of heap allocations, it is more efficient to have everything in one block
  • Actual payload is often very small (e.g. just an int for sockets or a FILE * for fs resources)
  • Having payload in-place might avoid the need for a custom free function in some cases

The approach outlined above, while slightly verbose, should maintain abi compat with existing code (struct size remains the same) and avoids the need for an additional 8 bytes or so size field which is unused in most cases.

jow- avatar Feb 20 '25 12:02 jow-

@jow-, I think that approach makes sense. I'm working on implementing it now. A few comments:

  • I will rename u64_or_constant to ext_flag in order to make it clear that it has different meanings.
  • I will limit uvcount to 8 bits. It doesn't make sense to make it larger, since the free function needs to iterate over all reserved entries, regardless of how many are actually used.
  • I will increase the number of datasize bits and make it a multiple of 8 bytes, which is useful for alignment purposes as well.
  • I think extended resources should always use embedded data and the function that allocates them should just pass a pointer to the caller. Having a setmem function is useless if you want the embedded data to have intrusive data structures like linked lists. And if you still want a pointer to a separately allocated data structure, just make your data pointer-sized.
  • Based on that, ucv_resource_dataptr should just always return NULL for extended resources.
  • I will make resource data start at (void *)(res + 1) in order to simplify embedding uc_resource_t, while still being able to use embedded values.

nbd168 avatar Feb 20 '25 13:02 nbd168

Perfect. I just took a quick glance and the only in-tree library that touches ->data directly is the debug one, it needs some slight extensions to properly decode extended headers; it's current usage will not break anything though, assuming ->data is substituted with ->data.memory or whatever the union definition ends up being.

The rpcd ucode.c module will break as well at compilation, it uses ->data once - but can be probably made forward- and backward compatible by switching it to ucv_resource_data() prior to merging this PR:

diff --git a/ucode.c b/ucode.c
index 5c84776..0f0e2b7 100644
--- a/ucode.c
+++ b/ucode.c
@@ -359,7 +359,7 @@ static void
 rpc_ucode_request_finish(rpc_ucode_call_ctx_t *callctx, int code, uc_value_t *reply)
 {
        rpc_ucode_script_t *script = callctx->script;
-       uc_resource_t *r;
+       uc_value_t *r;
        size_t i;
 
        if (callctx->replied)
@@ -376,9 +376,9 @@ rpc_ucode_request_finish(rpc_ucode_call_ctx_t *callctx, int code, uc_value_t *re
        callctx->replied = true;
 
        for (i = 0; i < ucv_array_length(script->pending_replies); i++) {
-               r = (uc_resource_t *)ucv_array_get(script->pending_replies, i);
+               r = ucv_array_get(script->pending_replies, i);
 
-               if (r && r->data == callctx) {
+               if (ucv_resource_data(r, NULL) == callctx) {
                        ucv_array_set(script->pending_replies, i, NULL);
                        break;
                }

jow- avatar Feb 20 '25 13:02 jow-

I plan on not breaking code using ->data, as long as it doesn't switch to extended resources:

    union {
        void *data;
        struct {
            uint32_t no_gc:1;
            uint32_t reserved:3;
            uint32_t uvcount:8;
            uint32_t datasize:20;
        }; 
    };

nbd168 avatar Feb 20 '25 13:02 nbd168

Ah, anonymous unions. I always avoided them due to their proprietary nature but given that we use many other extensions already this makes a lot of sense and simplifies things.

Edit: and they're in the C11 standard now, even better.

jow- avatar Feb 20 '25 13:02 jow-

@jow-, ping?

nbd168 avatar Mar 20 '25 11:03 nbd168

I have been rolling this patch for several days now, so ... Tested-by: John Crispin [email protected]

blogic avatar Apr 24 '25 18:04 blogic

FYI, I just found another issue in this series: since resources are not added to the vm values list, gc marking for embedded values does not work. I'm currently working on resolving this by creating a separate struct for extended resources, which contains an uc_weakref_t field.

nbd168 avatar Apr 25 '25 09:04 nbd168

Should be fine now. With the updated version, tests show that garbage collection leaves values held by no_gc resources alive.

nbd168 avatar Apr 25 '25 11:04 nbd168

Almost golden. My remaining pet peeve is the naming schema deviation in ucv_resource_get_no_gc() and ucv_resource_set_no_gc(). I would propose these alternatives names:

  • ucv_resource_retained_get() / ucv_resource_retained_set() [as in kept around on GC'ing]
  • ucv_resource_persistent_get() / ucv_resource_persistent_set() [as in persistent object unaffected by GCing]
  • ucv_resource_rooted_get() / ucv_resource_rooted_set() [as in treated as GC root]

Any opinion? I can take care of the renaming along with some other minor things I spotted.

jow- avatar Apr 28 '25 12:04 jow-

Thanks for the review. I think I'd use ucv_resource_persist_get and ucv_resource_persist_set

nbd168 avatar Apr 28 '25 19:04 nbd168

Either way, feel free to rename and clean up the things you spotted. Thanks

nbd168 avatar Apr 28 '25 19:04 nbd168

Merged now, thanks!

jow- avatar May 11 '25 12:05 jow-