`gcsafe` should not trigger on pure dereference operations
Nim Version
2.2.4
Description
var s = "test"
var t = new tuple[v: int]
var someint = new int
proc safe(): pointer {.gcsafe.} =
s[0] = 's' # safe, changes the payload, not the gc-managed reference
addr s[0] # accessing is gcsafe always
proc safe2(): int {.gcsafe.} =
someint[] = 4 # safe, changes the payload, not the reference
someint[] # just dereferncing is gcsafe always
proc safe3(): int {.gcsafe.} =
t.v = 4 # write to contents safe because it expands to `t[].v` with implicit deref
t.v # read of field safe for same reason
proc safe3() {.gcsafe.} =
echo someint[] # Safe, prints the value after dereferencing
proc notsafe(): ptr ref int =
someint = new int # performs work on the reference, not the value -> not safe
addr someint # performs work on the reference, not the value -> not safe
proc notsafe2() =
s = "test2" # performs work on the reference, not the value
proc notsafe3() =
echo repr(someint) # not safe, passes the _reference_ as parameter to the function
Since no gc operations are involved in the above code marked safe (no copying, refcounting, allocating new instances or anything like it etc), it should also not trigger gcsafe effects. The same logic goes for field access and many other read/post-dereference operations, where there is no refcount and/or copy event involved.
Edited to include examples from discussion below
Current Output
Error: 'myproc' is not GC-safe as it accesses 's' which is a global using GC'ed memory
Expected Output
Known Workarounds
No response
Additional Information
No response
Huh? But you can then use this to loophole gc safety...
gc safety is about not interacting with the gc, I would presume from the name: allocations, collections and other actions that might have gc-related consequences (and therefore violate threadvars etc involved in allocations) ..
accessing the contents of a gc-managed instance however does not interact with the gc and therefore should not trigger a gcsafety warning. In the examples above, we only touch the contents of someint really, not the gc-managed ref int - ie we don't "touch" its garbage-collectedness by adding or clearing references, making copies and so on so there's no reason these should be prohibited. I don't see any (additional) loopholes from the above code - what they have in common is that they dereference the gc instance before doing their meaningful work - ie addr doesn't take the address of the gc instance - it takes the address of the contents - ditto [].
the above code has additional thread safety issues of course - ie if it were used in a multithreaded context, one would have to afford additional protections - these are orthogonal from gcsafe however, ie just because something is gcsafe doesn't mean it's thread safe (and vice versa).
Loophole:
var myglobal = new(int)
proc access(): ptr ref int = addr myglobal
proc use() {.thread.} =
var a = access()
a[] = new(int) # uses unprotected RC ops, the very thing gc safety is supposed to prevent
createThread ..., use
But you can argue that addr is unsafe and thus the compiler should shut up about it.
Loophole:
This is a different example: it returns an address to the ref int - what I'm suggesting focuses on the contents of the reference - it would never allow the reference itself to be accessed.
Ie it basically says that using the dereference operator is not a gcsafe violation.
var myglobal = new(int)
proc allowed(): ptr int {.gcsafe.} = addr(myglobal[]) # notice []
proc notallowed: ptr ref int = addr(myglobal) # no dereference -> causes gcsafe violation
But you can argue that addr is unsafe
I don't think we need to go this far - if we start with allowing what I suggest, we'll get to a gcsafe rule that is sufficient and complete without being over-constraining.
I think right now, it's over-constrained because it also triggers for operations that do not involve gc operations (there is no unprotected gc op in the dereference operator - similar for strings and seqs when we access the payload with [N]) - dereference ([]) chiefly amongst them - the code in your loophole example should continue to be blocked for gcsafe to make sense.
So what is it that you suggest? Is it based on the expression structure or based on the involved types? If it's about the types then this tends to be quirky with Nim's unchecked generics.
Is it based on the expression structure
Both:ish - we can broadly categorize expressions into those that involve potentially involve gc (= above all, but also reset etc) and those that do not ([] above all), and for those that do, it depends on what type they're operating on so this analysis effectively has to happen post-monomorphization.
In its simplest form, "derefence" is safe because it effectively "removes" the gc layer from consideration (well, it unpeels one level of ref, then we have to recurse since ref ref int exists) for gcsafe violations as subsequent opertions will not know if it's a gc type or not - so for example nkAsgn on a ref type is a gcsafe violation - nkAsgn on the contents (after dereferencing) is not. Passing a ref int as an argument to proc (v: ref int) is a violation, but passing its dereferenced value to proc (v: int) is not and so on.
For string, nkAsgn on the string is a gcsafe violation but nkAsgn on the nkBracketExpr is not. Notably, this neatly maps in the compiler to every single place where you would call genRefAssign and its companions
To be clear, this applies also to writes to the referenced value - ie I can write to the payload of a string mytstring[0] = 'a' without involving the GC, ditto can I do myint[] = 5 - the pure read-only in the title refers to the "gc refcount" or ref layer of type - ie as long as the operation cannot involve any refcount changes, it's gcsafe
Another safe example:
var t = new tuple[v: int]
proc allowed() {.gcsafe.} =
t.v = 4 # safe, because this code is actually `t[].v = 4`, ie it's been dereferenced implicitly
It all mostly boils down to that all dereference-like operations are safe.