Nim icon indicating copy to clipboard operation
Nim copied to clipboard

`gcsafe` should not trigger on pure dereference operations

Open arnetheduck opened this issue 3 months ago • 10 comments

Nim Version

2.2.4

Description

var s = "test"
var t = new tuple[v: int]

var someint = new int

proc safe(): pointer {.gcsafe.} =
  s[0] = 's' # safe, changes the payload, not the gc-managed reference
  addr s[0] # accessing is gcsafe always

proc safe2(): int {.gcsafe.} =
  someint[] = 4 # safe, changes the payload, not the reference
  someint[] # just dereferncing is gcsafe always

proc safe3(): int {.gcsafe.} =
  t.v = 4 # write to contents safe because it expands to `t[].v` with implicit deref
  t.v # read of field safe for same reason

proc safe3() {.gcsafe.} =
  echo someint[] # Safe, prints the value after dereferencing

proc notsafe(): ptr ref int =
  someint = new int # performs work on the reference, not the value -> not safe
  addr someint # performs work on the reference, not the value -> not safe

proc notsafe2() =
  s = "test2" # performs work on the reference, not the value

proc notsafe3() =
  echo repr(someint) # not safe, passes the _reference_ as parameter to the function

Since no gc operations are involved in the above code marked safe (no copying, refcounting, allocating new instances or anything like it etc), it should also not trigger gcsafe effects. The same logic goes for field access and many other read/post-dereference operations, where there is no refcount and/or copy event involved.

Edited to include examples from discussion below

Current Output

Error: 'myproc' is not GC-safe as it accesses 's' which is a global using GC'ed memory

Expected Output


Known Workarounds

No response

Additional Information

No response

arnetheduck avatar Sep 22 '25 12:09 arnetheduck

Huh? But you can then use this to loophole gc safety...

Araq avatar Sep 22 '25 14:09 Araq

gc safety is about not interacting with the gc, I would presume from the name: allocations, collections and other actions that might have gc-related consequences (and therefore violate threadvars etc involved in allocations) ..

accessing the contents of a gc-managed instance however does not interact with the gc and therefore should not trigger a gcsafety warning. In the examples above, we only touch the contents of someint really, not the gc-managed ref int - ie we don't "touch" its garbage-collectedness by adding or clearing references, making copies and so on so there's no reason these should be prohibited. I don't see any (additional) loopholes from the above code - what they have in common is that they dereference the gc instance before doing their meaningful work - ie addr doesn't take the address of the gc instance - it takes the address of the contents - ditto [].

arnetheduck avatar Sep 22 '25 14:09 arnetheduck

the above code has additional thread safety issues of course - ie if it were used in a multithreaded context, one would have to afford additional protections - these are orthogonal from gcsafe however, ie just because something is gcsafe doesn't mean it's thread safe (and vice versa).

arnetheduck avatar Sep 22 '25 14:09 arnetheduck

Loophole:

var myglobal = new(int)

proc access(): ptr ref int = addr myglobal

proc use() {.thread.} = 
  var a = access()
  a[] = new(int) # uses unprotected RC ops, the very thing gc safety is supposed to prevent

createThread ..., use

But you can argue that addr is unsafe and thus the compiler should shut up about it.

Araq avatar Sep 23 '25 06:09 Araq

Loophole:

This is a different example: it returns an address to the ref int - what I'm suggesting focuses on the contents of the reference - it would never allow the reference itself to be accessed.

Ie it basically says that using the dereference operator is not a gcsafe violation.

var myglobal = new(int)

proc allowed(): ptr int {.gcsafe.} = addr(myglobal[]) # notice []
proc notallowed: ptr ref int = addr(myglobal) # no dereference -> causes gcsafe violation

arnetheduck avatar Sep 23 '25 07:09 arnetheduck

But you can argue that addr is unsafe

I don't think we need to go this far - if we start with allowing what I suggest, we'll get to a gcsafe rule that is sufficient and complete without being over-constraining.

I think right now, it's over-constrained because it also triggers for operations that do not involve gc operations (there is no unprotected gc op in the dereference operator - similar for strings and seqs when we access the payload with [N]) - dereference ([]) chiefly amongst them - the code in your loophole example should continue to be blocked for gcsafe to make sense.

arnetheduck avatar Sep 23 '25 07:09 arnetheduck

So what is it that you suggest? Is it based on the expression structure or based on the involved types? If it's about the types then this tends to be quirky with Nim's unchecked generics.

Araq avatar Sep 23 '25 08:09 Araq

Is it based on the expression structure

Both:ish - we can broadly categorize expressions into those that involve potentially involve gc (= above all, but also reset etc) and those that do not ([] above all), and for those that do, it depends on what type they're operating on so this analysis effectively has to happen post-monomorphization.

In its simplest form, "derefence" is safe because it effectively "removes" the gc layer from consideration (well, it unpeels one level of ref, then we have to recurse since ref ref int exists) for gcsafe violations as subsequent opertions will not know if it's a gc type or not - so for example nkAsgn on a ref type is a gcsafe violation - nkAsgn on the contents (after dereferencing) is not. Passing a ref int as an argument to proc (v: ref int) is a violation, but passing its dereferenced value to proc (v: int) is not and so on.

For string, nkAsgn on the string is a gcsafe violation but nkAsgn on the nkBracketExpr is not. Notably, this neatly maps in the compiler to every single place where you would call genRefAssign and its companions

arnetheduck avatar Sep 23 '25 08:09 arnetheduck

To be clear, this applies also to writes to the referenced value - ie I can write to the payload of a string mytstring[0] = 'a' without involving the GC, ditto can I do myint[] = 5 - the pure read-only in the title refers to the "gc refcount" or ref layer of type - ie as long as the operation cannot involve any refcount changes, it's gcsafe

arnetheduck avatar Sep 23 '25 08:09 arnetheduck

Another safe example:

var t = new tuple[v: int]

proc allowed() {.gcsafe.} =
  t.v = 4 # safe, because this code is actually `t[].v = 4`, ie it's been dereferenced implicitly

It all mostly boils down to that all dereference-like operations are safe.

arnetheduck avatar Sep 23 '25 09:09 arnetheduck