Add support for keeping value references in resources
This adds a gc_mark function pointer to the resource type. It allows keeping values alive that are referenced from within resource structs. Use this gc_mark support to drop object_registry from uloop and to simplify the code.
The object registry will still be needed though, just not for tracking internal pointers anymore.
But suppose you do something like ubus.defer(…) or uloop.handle(…) without storing the return value in a variable, the returned resource would be freed immediately so this would essentially move the burden of retaining resource references to user code, e.g. by holding alive resources in an array or dict until they‘re ready to dispose.
@jow- you're right, I missed that part. How about we combine the approach from this PR with your earlier suggestion for keeping resources alive using your proposed ucv_resource_set_external?
I see the need but conceptually I am not happy with exposing gc details like that. Need to think some more about it, but my gut feeling so far is that resources should provide a generic data structure which the vm/gc can iterate instead of introducing resource specific arbitrary logic callbacks solely dealing with forwarding marks. Having such a blackboxed marking logic will make it impossible in the future to optimize the current recursive marking logic into an iterative one using heap stacks to avoid stack overflows.
The most natural solution would be giving uc_resource_t a new uc_value_t *uvdata pointer which is then recursively followed like any other plain uv type.
Pro: simple for the gc marker to follow, no separate logic needed.
Con: slightly more annoying to use from C (e.g. need to put a scratch object or array there, then stuff related values in there and obtaining them using ucv_object_get(), ucv_array_get() instead of simple pointer assignments)
A varied approach would be appending a VLA to the end of uc_resource_t like that:
typedef struct {
uc_value_t header;
uc_resource_type_t *type;
void *data;
size_t uvcount;
uc_value_t *uvdata[];
} uc_resource_t;
Pro: easier to use from C - e.g. can use myresptr->uvdata[0] to get access to own values, still straightforward for the gc marker to follow
Con: still slightly less elegant than direct pointer access
However I think the C side aesthetic issues could be addressed with appropriate data accessor macros or inline functions.
Makes sense. I will think about it and create a new draft in this PR.
FYI, I decided to drop the VLA at the end of the struct, because it was breaking the struct module, which embeds uc_resource_t in another struct.
@jow- ping?
Seems I failed to actually submit my drafted review - sorry.
This already looks quite good - but I would really love to eliminate either of the data or uvcount members and add proper macros or inline functions to get/set the data value of a resource.
My not fully fleshed out idea is:
- Use the
u16_or_constantbit (which we eventually should rename since it bears different meanings in different contexts) to signifiy thatvoid *datais actually an extended header - Use the three least significant bits for the following purposes:
- bit 0: when true, signifies that the resource is exempted from garbage collection
- bit 1: reserved
- bits 2..15: number of subsequent uc_value_t pointers [0..16384]
- bits 16..31: size of extra data [0..65535]
- When the
u16_or_constantuv header bit is false,datais an ordinary memory address - Otherwise
datashould be treated as:
union {
void *memory;
struct {
uint32_t no_gc:1;
uint32_t reserved:1;
uint32_t uvcount:14;
uint32_t datasize:16;
} exthdr;
} data
Then we should introduce void *ucv_resource_data_get(uc_value_t *) and void ucv_resource_data_set(uc_value_t *, void *) which do the following:
- Check
res->header.u16_or_constant, when true,return (void *)((char *)res + sizeof(uc_resource_t) + sizeof(uc_value_t *) * res->data.exthdr.uvcount)otherwise returnres->data.memory - Similar for setting data, check for extended header, when true,
*(void **)((char *)res + sizeof(uc_resource_t) + sizeof(uc_value_t *) * res->data.exthdr.uvcount) = valueotherwiseres->data.memory = value
And we probably need a new allocator, so ucv_resource_create_ex(uc_vm_t *vm, const char *typename, bool no_gc, size_t uvcount, size_t datasize), which would:
assert(uvcount <= 16383)assert(datasize <= 65535)res = xalloc(sizeof(uc_resource_t) + uvcount * sizeof(uc_value_t *) + datasize)res->header.u16_or_constant = true; // Has extended headerres->data.exthdr.no_gc = no_gc; res->data.exthdr.uvcount = uvcount; res->data.exthdr.datasize = datasize;
And maybe ucv_resource_data_set() should be actually two functions, one ucv_resource_data_setptr(uc_value_t *uv, void *val) which assumes a void * sized data area which is treated as void pointer and ucv_resource_data_setmem(uc_value_t *uv, size_t sz, void *data) which memcopies the given amount of bytes into the data area.
The thoughts behind this are:
- I noticed that I increasingly tend to use resource objects with trailing payload data to reduce the number of heap allocations, it is more efficient to have everything in one block
- Actual payload is often very small (e.g. just an int for sockets or a FILE * for fs resources)
- Having payload in-place might avoid the need for a custom free function in some cases
The approach outlined above, while slightly verbose, should maintain abi compat with existing code (struct size remains the same) and avoids the need for an additional 8 bytes or so size field which is unused in most cases.
@jow-, I think that approach makes sense. I'm working on implementing it now. A few comments:
- I will rename
u64_or_constanttoext_flagin order to make it clear that it has different meanings. - I will limit
uvcountto 8 bits. It doesn't make sense to make it larger, since the free function needs to iterate over all reserved entries, regardless of how many are actually used. - I will increase the number of
datasizebits and make it a multiple of 8 bytes, which is useful for alignment purposes as well. - I think extended resources should always use embedded data and the function that allocates them should just pass a pointer to the caller. Having a
setmemfunction is useless if you want the embedded data to have intrusive data structures like linked lists. And if you still want a pointer to a separately allocated data structure, just make your data pointer-sized. - Based on that,
ucv_resource_dataptrshould just always returnNULLfor extended resources. - I will make resource data start at
(void *)(res + 1)in order to simplify embeddinguc_resource_t, while still being able to use embedded values.
Perfect. I just took a quick glance and the only in-tree library that touches ->data directly is the debug one, it needs some slight extensions to properly decode extended headers; it's current usage will not break anything though, assuming ->data is substituted with ->data.memory or whatever the union definition ends up being.
The rpcd ucode.c module will break as well at compilation, it uses ->data once - but can be probably made forward- and backward compatible by switching it to ucv_resource_data() prior to merging this PR:
diff --git a/ucode.c b/ucode.c
index 5c84776..0f0e2b7 100644
--- a/ucode.c
+++ b/ucode.c
@@ -359,7 +359,7 @@ static void
rpc_ucode_request_finish(rpc_ucode_call_ctx_t *callctx, int code, uc_value_t *reply)
{
rpc_ucode_script_t *script = callctx->script;
- uc_resource_t *r;
+ uc_value_t *r;
size_t i;
if (callctx->replied)
@@ -376,9 +376,9 @@ rpc_ucode_request_finish(rpc_ucode_call_ctx_t *callctx, int code, uc_value_t *re
callctx->replied = true;
for (i = 0; i < ucv_array_length(script->pending_replies); i++) {
- r = (uc_resource_t *)ucv_array_get(script->pending_replies, i);
+ r = ucv_array_get(script->pending_replies, i);
- if (r && r->data == callctx) {
+ if (ucv_resource_data(r, NULL) == callctx) {
ucv_array_set(script->pending_replies, i, NULL);
break;
}
I plan on not breaking code using ->data, as long as it doesn't switch to extended resources:
union {
void *data;
struct {
uint32_t no_gc:1;
uint32_t reserved:3;
uint32_t uvcount:8;
uint32_t datasize:20;
};
};
Ah, anonymous unions. I always avoided them due to their proprietary nature but given that we use many other extensions already this makes a lot of sense and simplifies things.
Edit: and they're in the C11 standard now, even better.
@jow-, ping?
I have been rolling this patch for several days now, so ... Tested-by: John Crispin [email protected]
FYI, I just found another issue in this series: since resources are not added to the vm values list, gc marking for embedded values does not work. I'm currently working on resolving this by creating a separate struct for extended resources, which contains an uc_weakref_t field.
Should be fine now. With the updated version, tests show that garbage collection leaves values held by no_gc resources alive.
Almost golden. My remaining pet peeve is the naming schema deviation in ucv_resource_get_no_gc() and ucv_resource_set_no_gc(). I would propose these alternatives names:
ucv_resource_retained_get()/ucv_resource_retained_set()[as in kept around on GC'ing]ucv_resource_persistent_get()/ucv_resource_persistent_set()[as in persistent object unaffected by GCing]ucv_resource_rooted_get()/ucv_resource_rooted_set()[as in treated as GC root]
Any opinion? I can take care of the renaming along with some other minor things I spotted.
Thanks for the review. I think I'd use ucv_resource_persist_get and ucv_resource_persist_set
Either way, feel free to rename and clean up the things you spotted. Thanks
Merged now, thanks!