Documentation wrongly suggests registering immediate values with the GC
If a variable of type value is known to always be immediate (not a pointer), then it need not be registered with the GC. In fact, registering it with the GC forces the GC to do useless work.
That’s correct. One question I have would be, are we ready to set this exception in stone by adding it to the manual? Intuitively, I would say that the OCaml GC is unlikely to ever require the registration of integer-like values.
I think that this suggestion is not a priority for how to improve the C FFI documentation.
Regarding root registration, the C FFI has an easy mode and a hard mode:
- easy mode: register everything and sleep at night
- hard mode: think hard about when the runtime call a GC, which values may be invalidated them, and sprinkle the minimal amount of root-registration code with comments around to explain why you believe you are safe -- and live in fear forever
Currently the manual only attempts to document the easy mode. There is slightly more, going into hard-mode direction, about initialization of values (when you need to use caml_initialize, and when direct Field assignments are okay). I am not sure why the manual is this way (I was not around when it was first written), but my suspicion is that value initialization is actually performance-sensitive, while root-registration in general isn't.
Some code in the runtime system needs to use hard mode, because it sits inside the runtime proper and is awfully performance-sensitive or needs to reason about the runtime lock for different reason. There is a lot more hard-mode code around, for example when dealing with input-output and other syscalls, because the code was written by the same people who wrote the runtime, and they basically lived in hard mode all the time.
Do non-runtime users need to use the hard mode? Honestly I doubt it, I think that the most frequent reason to learn about hard-mode is to contribute to the runtime (or write third-party library that are essentially meant to be part of the runtime, like Camlroots or our own boxroot). Registrating local roots is pretty damn fast, and I haven't encountered a program where registration of roots is a performance issue. (Despite having written root-registration benchmarks myself.)
In general I am always in favor of more documentation, including of complex aspects. (Real World OCaml does a better job than the manual in documenting the runtime internals, and I think we should consider improving this aspect.). So I'm not saying that we must not document the hard mode. But if we want to document it, there is a lot more to say than just "well immediate variables don't have to be registered, except if you later mutate them with a non-immediate value, oops". Basically we need a significant extension of the manual to explain the hard mode, with some thinking about how it integrates with the current content. This is a much larger task, it is interesting to consider (but it requires a different commitment), and in that perspective the current issue is but a footnote. Without explicitly deciding to document the hard mode, I'm not sure what value there is in the suggestion (it's a departure from the current documentation choice, with few clear benefits on its own).
The status of this issue is that no one is actively trying to write a more comprehensive FFI documentation. It would be a lot of work and requires expertise. If someone is interested in attacking this significant task (@DemiMarie ?), please get in touch, we would welcome contributions.
This issue has been open one year with no activity and without being identified as a bug. Consequently, it is being marked with the "stale" label to see if anyone has comments that provide new information on the issue. Is the issue still reproducible? Did it appear in other contexts? How critical is it? Did you miss this feature in a new setting? etc. Note that the issue will not be closed in the absence of new activity: the "stale" label is only used for triaging reason.
I think the thing that makes the documentation confusing is that it's not obvious that it is describing an "easy mode". e.g.
Rule 1 A function that has parameters or local variables of type value must begin with a call to one of the CAMLparam macros and return with CAMLreturn, CAMLreturn0, or CAMLreturnT.
Yet when you read other people's code or the OCaml runtime, this rule clearly isn't being followed. Then people try to guess what the real rule is, and may guess incorrectly.
It would be good to give an intuition of why this is needed (if a GC is triggered then values can move and pointers need updating). Then explain that to avoid wondering whether a value might ever contain a pointer or whether a GC might be triggered, you should just register everything (and this is fast).
It would be good to give an intuition of why this is needed (if a GC is triggered then values can move and pointers need updating). Then explain that to avoid wondering whether a value might ever contain a pointer or whether a GC might be triggered, you should just register everything (and this is fast).
This sounds reasonable.
I made an attempt at clarifying the documentation over at https://github.com/ocaml/ocaml/pull/14273.
Can we consider this issue to be resolved following the merge of #14273?