arocc Implement clang bounds-safety attributes

https://clang.llvm.org/docs/BoundsSafety.html

There's a lot here if you include the runtime checks and new builtins but I think it can be handled in chunks - my suggestion would be that we start with a flag like -fexperimental-bounds-safety and add some of the easier attributes first (single, bidi_indexable, terminated_by, etc). counted_by / sized_by might need parser changes because it looks like they can refer "forward" to identifiers which aren't defined yet:

void foo(int *__attribute__((counted_by(count))) p, size_t count) { ... }

This also adds compile time checking for certain types of pointers. For example single pointers can only be dereferenced or indexed with a constant value of 0.

Not all of the types will map cleanly to zig types - ended_by (pointer with another pointer as its upper bound) but this should allow us to translate things into Zig single pointers, slices, and sentinel-terminated slices.

Nov 10 '25 07:11 ehaas

We introduce a -fexperimental-bounds-safety flag: when it is disabled, all bounds-safety attributes are treated as no-ops; when enabled, we parse them, keep them in the AST, and run static checks. In the initial phase we support single, bidi_indexable, and terminated_by.

A T *__single pointer is treated as a “single-element” pointer (conceptually similar to Zig T) with compile-time rules that only allow p. / p[0] and reject pointer arithmetic or non-constant p[i]. A T *__bidi_indexable pointer remains a “wide” C pointer but is tagged internally.

A T *__terminated_by(c) pointer keeps its ABI as *T, but we can expose it in Zig as a [:c]T sentinel-terminated slice via a wrapper. For counted_by / sized_by, the parser must support forward references: during parsing we record these attributes as unresolved, and once the full parameter/field list is known we resolve the referenced identifier or emit a diagnostic.

Conceptually, T *__bidi_indexable __counted_by(len) maps to a Zig []T wrapper while the raw FFI remains (*T, usize), and __sized_by(bytes) maps to []T or []u8 depending on the element type, with flexible arrays annotated with counted_by treated as tail buffers of length len. Summarizing the high-level Zig view: T *__single → *T, T *__bidi_indexable __counted_by(len) → []T, T *__bidi_indexable __sized_by(bytes) → []T / []u8, T *__terminated_by(c) → [:c]T, and attributes like ended_by stay as plain C pointers with internal annotations only.

A later stage can then introduce internal “wide pointer” lowering and runtime checks once parsing and static validation are in good shape.

Nov 30 '25 00:11 roussov

Hi @roussov - are you referring to existing work you've done? Or suggesting a plan for implementing this?

Nov 30 '25 00:11 ehaas

Hi @ehaas A bit of both. I’ve done some initial experiments locally to verify that the approach is viable, but nothing polished or merged anywhere yet. What I outlined here is meant as a concrete plan for how this could be implemented and integrated properly in the project.

Nov 30 '25 13:11 roussov

Since it's behind a flag, I would suggest PR'ing small pieces rather than trying to get everything working at once - that will be easier to review, plus it will prevent merge conflicts from building up due to other changes. For example, I just added single and unsafe_indexable support, since those are the easiest. Just adding deferred decl resolution (being able to reference declarations that don't even exist yet) support to the parser is a good chunk of work: https://www.open-std.org/JTC1/sc22/wg14/www/docs/n3656.pdf

Nov 30 '25 17:11 ehaas