zig Runtime safety (panic interface, slices)

Overview

This initial section is only intended as a high-level explanation of the obvious features of the new panic interface.

This will be the overarching order of discussion (including later installments):

New panic interface
Slice syntax
Implementation progress
Performance measurements and examples
Significant options and future patches

Note: names of functions, function parameters, fields, types, and compile commands, and texts of compile error messages and panic error messages will be changed if exact substitutes are provided. These changes will be applied last.

New panic interface

The purpose of this change is to allow users to define the builtin compiler panic interface.

Closes #17969 Fixes #19992

Below is the design suggested by @andrewrk in the linked issue:

pub const PanicCause = union(enum) {
    unreach,
    unwrap_null,
    cast_to_null,
    incorrect_alignment,
    out_of_bounds: struct {
        index: usize,
        len: usize,
    },
    sentinel_mismatch_usize: usize,
    sentinel_mismatch_isize: isize,
    sentinel_mismatch_other,
    /// This one comes from a call to `@panic`.
    application_msg: []const u8,
    // ...
};

pub fn panic(cause: PanicCause, error_return_trace: ?*StackTrace, ret_addr: ?usize) noreturn {
    // ...
}

Below is the final design, implemented by this patch:

pub const PanicCause = union(enum(u8)) {
    message = 0,
    unwrapped_error = 1,
    returned_noreturn = 2,
    reached_unreachable = 3,
    corrupt_switch = 4,
    accessed_out_of_bounds = 5,
    accessed_out_of_order = 6,
    accessed_out_of_order_extra = 7,
    accessed_inactive_field: type = 8,
    // ...
    cast_truncated_data: Cast = 26,
    cast_to_enum_from_invalid: type = 27,
    cast_to_error_from_invalid: Cast = 28,
    cast_to_ptr_from_invalid: usize = 29,
    cast_to_int_from_invalid: Cast = 30,
    cast_to_unsigned_from_negative: Cast = 31,
};
pub fn PanicData(comptime cause: PanicCause) type {
    switch (cause) {
        .message => {
            return []const u8;
        },
        .returned_noreturn,
        .reached_unreachable,
        .accessed_null_value,
        .divided_by_zero,
        .corrupt_switch,
        => {
            return void;
        },
        // ...
    }
}
pub fn panic(comptime cause: PanicCause, data: anytype) noreturn {
    // ...
}

Changes to the panic function signature

Exclusion of return address `?usize`

The purpose of the ?usize parameter is to allow the special handler functions to forward their return addresses through debug.panicExtra, which calls back to builtin.default_panic to get to debug.panicImpl. See commit 694fab484805088030fa36efe3e6b6e7ee385852.

Usage in detail ...

These panic IDs call their special handler functions with context data: inactive_union_field, sentinel_mismatch, index_out_of_bounds, start_index_greater_than_end, and unwrap_error. Every other panic ID emits a call to builtin.default_panic with a message from panic_messages.
There are no panic IDs where ?usize is non-null.
There is a single call to builtin.default_panic throughout the standard library. The sole caller is debug.panicExtra.
There are six calls to debug.panicExtra throughout the standard library. Five are from the special handler functions. The final caller is debug.panic, which calls with both ?usize and ?*builtin.StackTrace as null.

Exclusion of stack trace pointer `?*builtin.StackTrace`

The only possible non-null value for the stack trace parameter to builtin.default_panic originates from builtin.panicUnwrapError via debug.panicExtra.

Usage in detail ...

There are four calls to @errorReturnTrace() throughout the standard library; this is relevant because it reveals every potential origin of a non-null stack trace parameter in a call to builtin.default_panic. All instances are found in startup code (where errors thrown back to entry or thread-main are caught) where debug.dumpStackTrace is called with the unwrapped, dereferenced return value.
The only other call to debug.dumpStackTrace throughout the standard library is from debug.panicImpl.
There are four calls to debug.panicImpl throughout the standard library. Three are exclusive to the Windows fault handler (which does not compile) and each call with ?*builtin.StackTrace as null. The only remaining caller is builtin.default_panic.

All remaining callers call with `null` for both values

The special handler functions will be removed by this patch, meaning that neither parameter will ever be defined at the interface with the compiler, except by panic cause unwrapped_error where it (the stack trace pointer) is part of the panic data and not an independent parameter.

The excluded parameters only have the appearance of a purpose due to the standard library overusing builtin.default_panic. See commit 754ea118bc7192b75c337ae95ad54dbe619dea56.

Comptime-known panic cause `builtin.PanicCause`

Closes #18995

Two of the existing five special handler functions are generic, because they needed to be so that the feature would work. However for this interface there are significant costs if the interface is not generic.

If the panic cause is runtime-known:

Certain types of context data can not and will never allow reporting, either because the data size exceeds the maximum permitted by any reasonable common type, or because the type ID(s) relevant to the panic condition has unlimited variety and there is no possible common type (such as with pointers).
Multiple additional causes (where data type is the only variable, like sentinel_mismatch_usize: usize, sentinel_mismatch_isize: isize in the suggested interface) would be required to support some of the current variety of panic data, and the logic required to determine which (sub-)causes should be used for which type(s) is complex.
Every writer for every panic cause must be compiled regardless of whether it is used. This would mean that programs compiling std.builtin.panicImpl would include Errol3 tables for the sole purpose of representing float sentinels. The ability to override builtin.panic does not excuse this requirement, because the user should not be forced to compile any (potentially unused) handler defined by their own implementation either.
Some data types for some panic causes become substantially more expensive to report when a compromise type is used. The significance of this point is demonstrated by a later example.

A fixed interface is a downgrade for existing and future runtime safety features (1). It is more complex to maintain (2), comes at a greater upfront cost to the user (3), and is innately less efficient in the fail block for reliance on compromise types (4).

If the panic cause is compile-time-known, meaning the interface is generic:

The user pays for precisely what they use, regardless of whether panic is their definition or the standard library's.
Users are able to define compile errors and other compile-time behaviour for specific panic causes.
Every panic data payload type for every panic cause is supported. This patch adds support for panic data for every panic cause with relevant data: unwrapped_error: Discarded error and the error return trace. accessed_out_of_bounds: Out-of-bounds index and the upper bound. accessed_out_of_order: Start and end operands. accessed_out_of_order_extra: Start and end operands and the upper bound of the pointer operand. accessed_inactive_field: Requested field and the active field. alias_of_noalias_parameter: Destination and source addresses and the length. mismatched_memcpy_arguments: Destination and source lengths. mismatched_for_loop_capture_lengths: Expected length and the length of the first mismatch. mismatched_null_terminator: Value of the element which should be null. mismatched_sentinel: Expected value and the actual value. mul_overflowed: LHS and RHS operands. add_overflowed: LHS and RHS operands. sub_overflowed: LHS and RHS operands. div_with_remainder: LHS and RHS operands. shl_overflow: Integer value and the shift amount. shr_overflow: Integer value and the shift amount. shift_amt_overflowed: Shift amount. cast_to_ptr_from_invalid: Address. cast_to_int_from_invalid: Float value. cast_truncated_data: Integer value. cast_to_unsigned_from_negative: Signed integer value. cast_to_enum_from_invalid: Integer value. cast_to_error_from_invalid: Integer value or error set element.

Notes:

Correction: The stack trace parameter of builtin.default_panic is non-null for errors unwrapped like switch (err) { error.A => @panic("message") }. This functionality could be implemented in the new interface with an additional panic cause unwrapped_error_extra, with an extra data field for the panic message. Whether this will be added to this patch is undecided. The current behaviour for this 'switch-prong-panic-unwrap' syntax is panic cause message like any other @panic.
An example of a 'common type' is usize being used as the payload type for all unsigned integers (like the suggested sentinel data type) because it can accommodate most of them, with the exceptions being so infrequent that it is no great loss.
An example of a 'compromise type' is using @tagName to convert all types of enumeration to strings and using struct { active: [:0]const u8, accessed: [:0]const u8 } to report inactive field accesses for tagged unions.

Panic examples from the compiler safety testsuite

Spoiler warning.

spoilers

Slice syntax

This patch includes changes to slice semantic analysis. For this and future sections, the implementation offered by this patch will be referred to as Sema.analyzeSlice2.

This section is only intended to address changes which directly or indirectly contribute to correctness.

Differences compared to status quo

These differences do not directly contribute to the correctness of Sema.analyzeSlice2.

Slice operation `slice_start` does not preserve sentinel value for `Many` pointer operands

This difference exists because Sema.analyzeSlice2 ignores the values of sentinels for pointer operand types without explicit lengths, because (they mean different things) these can not participate in the same proofs as sentinels for pointer operand types with explicit lengths.

The original behaviour can be restored if that is preferred. The cost of doing so is unknown.

Slicing `*T` uses the same core logic and error messages as every other pointer operand

Status quo allows slicing pointers-to-one like *T using the slice_end syntax. This variant has a separate set of compile errors which are arguably more instructive.

This implementation only provides special compile errors for slicing *T when start or end operands are runtime-known. This is done to adhere to the stricter standard (status quo), which requires that start and end operands are known at compile time. If runtime start and end operands were permitted the behaviour of slicing *T would be indistinguishable from the behaviour of slicing *[1]T.

Special compile errors similar to status quo may be added if that is necessary. The cost of adding these special error messages would be high in relation to how often they would be seen.

Casting from pointer-to-array types to slice types and from slice types to slice types is permitted by @ptrCast

Closes #20057

The following example requires the patched compiler.

Compile and run example program with zig-runtime_safety run slice_cast.zig slice_cast.zig:

const std = @import("std");

pub fn panic2(_: anytype) noreturn {
    @setRuntimeSafety(false);
    unreachable;
}

pub fn main() void {
    const types: []const type = &.{
        u8,  u32, u64,
        i16, i64, i128,
    };
    inline for (types, 0..) |T, idx1| {
        inline for (types, 0..) |U, idx2| {
            if (@sizeOf(T) == @sizeOf(U)) {
                continue;
            }
            var buf: [64 + idx1 + idx2]T = undefined;
            const slice1: []T = &buf;
            const slice2: []align(@alignOf(T)) U = @ptrCast(slice1);

            std.debug.print(
                "*[{}]" ++ @typeName(T) ++ "  \t->\t*[{}]" ++ @typeName(U) ++ "\n",
                .{ slice1.len, slice2.len },
            );
        }
    }
}

Output:

zig-runtime_safety run slice_cast.zig
*[65]u8  	->	*[16]u32
*[66]u8  	->	*[8]u64
*[67]u8  	->	*[33]i16
*[68]u8  	->	*[8]i64
*[69]u8  	->	*[4]i128
*[65]u32  	->	*[260]u8
*[67]u32  	->	*[33]u64
*[68]u32  	->	*[136]i16
*[69]u32  	->	*[34]i64
*[70]u32  	->	*[17]i128
*[66]u64  	->	*[528]u8
*[67]u64  	->	*[134]u32
*[69]u64  	->	*[276]i16
*[71]u64  	->	*[35]i128
*[67]i16  	->	*[134]u8
*[68]i16  	->	*[34]u32
*[69]i16  	->	*[17]u64
*[71]i16  	->	*[17]i64
*[72]i16  	->	*[9]i128
*[68]i64  	->	*[544]u8
*[69]i64  	->	*[138]u32
*[71]i64  	->	*[284]i16
*[73]i64  	->	*[36]i128
*[69]i128  	->	*[1104]u8
*[70]i128  	->	*[280]u32
*[71]i128  	->	*[142]u64
*[72]i128  	->	*[576]i16
*[73]i128  	->	*[146]i64

Changes to slice behaviour

These changes result in differences in behaviour significant to these categories of outcome:

Compile error
Runtime panic
Success

Added sentinel runtime safety check for compile-time-known pointer operands referencing runtime-known memory

Fixes #19792

Compile and run example program with zig-runtime_safety run omit_sentinel_safety_check.zig omit_sentinel_safety_check.zig:

var src_mem: [3:0]u8 = .{ 'a', 'b', 'c' };
pub fn main() void {
    const src_ptr: [:0]const u8 = &src_mem;
    const slice: [:0]const u8 = src_ptr[0..1 :0];
    if (slice[slice.len] != 0) {
        unreachable;
    }
}

Output:

zig-runtime_safety run omit_sentinel_safety_check.zig
thread 87608 panic: mismatched null terminator: 98

omit_sentinel_safety_check.zig:4:40: 0x8001308 in main (omit_sentinel_safety_check.zig)
    const slice: [:0]const u8 = src_ptr[0..1 :0];
                                       ^
lib/std/start.zig:501:22: 0x800053e in posixCallMainAndExit (omit_sentinel_safety_check.zig)
            root.main();
                     ^
lib/std/start.zig:253:5: 0x800001a in _start (omit_sentinel_safety_check.zig)
    asm volatile (switch (native_arch) {
    ^

This difference exists because Sema.analyzeSlice incorrectly assumes that a compile-time-check will always perform the test for this combination of inputs. Instead, the pointer load fails (because the memory is runtime-known) and there is no mechanism in place to try again at runtime.

Corrected computation of upper bound used by runtime safety check

Fixes #19794

Compile and run example program with zig-runtime_safety run underestimate_upper_bound.zig underestimate_upper_bound.zig:

var dest_end: usize = 3;
pub fn main() void {
    const src_mem: [3:0]usize = .{ 0, 0, 0 };
    const src_ptr: [:0]const usize = &src_mem;
    _ = src_ptr[0..dest_end :0];
    _ = src_ptr[0..][0..dest_end :0];
}

Output:

zig-runtime_safety run underestimate_upper_bound.zig

This will be explained in future comments or issues on slicing to consume sentinel.

Corrected computation of upper bound used by compile-time assertion

Fixes #19795

Compile example program with zig-runtime_safety build-obj overestimate_upper_bound.zig overestimate_upper_bound.zig:

var src_mem: [3]u8 = .{ 0, 0, 0 };
pub fn main() void {
    const src_ptr: *[3]u8 = src_mem[0..3];
    _ = src_ptr[1..3 :1];
    _ = src_ptr[1..][0..2 :1];
}

Output:

zig-runtime_safety build-obj overestimate_upper_bound.zig
overestimate_upper_bound.zig:4:20: error: slice sentinel out of bounds: end 3(+1), length 3
    _ = src_ptr[1..3 :1];
                   ^
referenced by:
    callMain: lib/std/start.zig:501:17
    callMainWithArgs: lib/std/start.zig:469:12
    remaining reference traces hidden; use '-freference-trace' to see all reference traces

This will be explained in future comments or issues on slicing to consume sentinel.

Substituted runtime crash with compile error for `slice_start` (with sentinel) for certain pointer operands

Fixes #19796

Compile example program with zig-runtime_safety build-obj slice_start_sentinel_always_out_of_bounds.zig slice_start_sentinel_always_out_of_bounds.zig:

pub fn main() !void {
    var buf: *const [8]u8 = &([1]u8{0} ** 8);
    _ = buf[0.. :1];
}

Output:

zig-runtime_safety build-obj slice_start_sentinel_always_out_of_bounds.zig
slice_start_sentinel_always_out_of_bounds.zig:3:18: error: sentinel index always out of bounds
    _ = buf[0.. :1];
                 ^
referenced by:
    callMain: lib/std/start.zig:511:32
    callMainWithArgs: lib/std/start.zig:469:12
    remaining reference traces hidden; use '-freference-trace' to see all reference traces

Added compile-time assertions for slices of reinterpreted memory for certain `comptime` pointer operands

Closes #19797

Compile example program with zig-runtime_safety build-obj demo_various_unusable_results.zig demo_various_unusable_results.zig:

const T = extern struct {
    a: usize = 1,
    b: u32 = 0,
    c: [4]u16 = .{ 2, 3, 4, 5 },
};
const S = extern struct {
    a: usize = 1,
    b: T = .{},
    c: [4]u8 = .{ 2, 3, 4, 5 },
};
var mem1: [2]S = .{ .{}, .{} };
const mem2: [2]S = .{ .{}, .{} };
comptime {
    const ptr1: [*]usize = @ptrCast(&mem1);
    const len1: usize = (2 * @sizeOf(S)) / @sizeOf(usize);
    const slice1: []usize = ptr1[0..len1];

    _ = ptr1[slice1.len + 1 ..];
}
comptime {
    const ptr1: [*]const usize = @ptrCast(&mem2);
    const ptr2: [*]const u32 = @ptrCast(ptr1[2..]);
    const len2: usize = ((2 * @sizeOf(S)) - 2 * @sizeOf(usize)) / @sizeOf(u32);
    const slice2: []const u32 = ptr2[0..len2];

    _ = ptr2[slice2.len + 1 ..];
}
comptime {
    var mem3: [2]S = .{ .{}, .{} };
    const ptr1: [*]usize = @ptrCast(&mem3);
    const ptr2: [*]u32 = @ptrCast(ptr1[2..]);
    const ptr3: [*]u8 = @ptrCast(ptr2[1..]);
    const len3: usize = (((2 * @sizeOf(S)) - 2 * @sizeOf(usize)) - @sizeOf(u32)) / @sizeOf(u8);
    const slice3: []u8 = ptr3[0..len3];

    _ = ptr3[slice3.len + 1 ..];
}
comptime {
    const mem4: [2]S = .{ .{}, .{} };
    const ptr4: [*]const u16 = @ptrCast(&mem4[0].b.c[2]);
    const len4: usize = ((2 * @sizeOf(S)) - (@offsetOf(S, "b") + @offsetOf(T, "c") + (2 * @sizeOf(u16)))) / @sizeOf(u16);
    const slice4: []const u16 = ptr4[0..len4];

    _ = ptr4[slice4.len + 1 ..];
}
comptime {
    var mem5: comptime_int = 0;
    const ptr5: [*]comptime_int = @ptrCast(&mem5);
    const slice5: []comptime_int = ptr5[0..1];

    _ = ptr5[slice5.len + 1 ..];
}
comptime {
    var mem6: comptime_int = 0;
    const ptr6: [*]type = @ptrCast(&mem6);

    _ = ptr6[0..1];
}

Output:

zig-runtime_safety build-obj demo_various_unusable_results.zig
demo_various_unusable_results.zig:18:25: error: slice start out of bounds of reinterpreted memory: start 11, length 10
    _ = ptr1[slice1.len + 1 ..];
             ~~~~~~~~~~~^~~
demo_various_unusable_results.zig:26:25: error: slice start out of bounds of reinterpreted memory: start 17, length 16
    _ = ptr2[slice2.len + 1 ..];
             ~~~~~~~~~~~^~~
demo_various_unusable_results.zig:36:25: error: slice start out of bounds of reinterpreted memory: start 61, length 60
    _ = ptr3[slice3.len + 1 ..];
             ~~~~~~~~~~~^~~
demo_various_unusable_results.zig:44:25: error: slice start out of bounds of reinterpreted memory: start 29, length 28
    _ = ptr4[slice4.len + 1 ..];
             ~~~~~~~~~~~^~~
demo_various_unusable_results.zig:51:25: error: slice start out of bounds of reinterpreted memory: start 2, length 1
    _ = ptr5[slice5.len + 1 ..];
             ~~~~~~~~~~~^~~
demo_various_unusable_results.zig:57:13: error: type 'comptime_int' cannot be reinterpreted as type 'type'
    _ = ptr6[0..1];
        ~~~~^~~~~~

The functions used to compute these lengths are currently:

Sema.RuntimeSafety.abiSizeOfContainingDecl for element types with runtime bits.
Sema.RuntimeSafety.reinterpretLengthOfContainingDecl for element types without runtime bits.

Reduced variety of outcome when slicing compile-time-known undefined pointers, etc.

Fixes #19798

less_various_behaviour_of_slice_of_const_undefined.zig:

var end: usize = 1;
var val: u8 = 47;
pub fn main() void {
    // 1. OK
    _ = @as([*]u8, undefined)[0..0];
    _ = @as(*[0]u8, undefined)[0..0];
    _ = @as(*u8, undefined)[0..0];
    _ = @as([*c]u8, undefined)[0..0];
    _ = @as([]u8, undefined)[0..0];

    // 2. Compile error: non-zero length slice of undefined pointer
    _ = @as(*[0:1]u8, undefined)[1..];
    _ = @as([*]u8, undefined)[0..1];

    // 3. Compile error: slice of undefined pointer with runtime length causes undefined behaviour
    _ = @as([*]u8, undefined)[0..end];
    _ = @as([]u8, undefined)[0..end];
    _ = @as(*[0]u8, undefined)[0..end :0];

    // 4. Compile error: slice end out of bounds: end 1, length 0
    _ = @as([]u8, &.{})[0..1];

    // 5. Runtime panic: index out of bounds: index 1, length 0
    _ = @as([]u8, &.{})[0..end];
}

This new behaviour reflects the third option presented in the linked issue. This was done to avoid adding the complexity of breakages (potentially at runtime in the case of the second option) to what is likely to be a time-consuming merge.

Implementation progress

Sema

[x] Fixed omission of runtime safety check by @shrExact for compile-time-known vector RHS operands.
[x] Applied consistent feature restrictions for panic_fn, panic_unwrap_error, and safety_check_formatted.
[x] Removed optional usage of the old interface.
[x] Removed unused functions, such as:
- panicUnwrapError
- panicIndexOutOfBounds
- panicInactiveUnionField
- panicSentinelMismatch
- safetyCheckFormatted
- safetyPanic
- analyzeSlice
[x] Added compile errors for non-scalar sentinels in functions: zirReify, zirArrayTypeSentinel, zirPtrType, analyzeSlice.
[x] Updated functions with panic conditions to use new panic preparation functions:
- checkAccessInactiveUnionField by unionFieldPtr
- checkAccessInactiveUnionField by unionFieldVal
- checkAccessNullValue by analyzeOptionalPayloadPtr
- checkAccessNullValue by zirOptionalPayload
- checkAccessNullValue, checkAccessOutOfBounds, checkAccessOutOfOrder, and checkMismatchedSentinel by analyzeSlice
- checkAccessOutOfBounds by elemPtrArray
- checkAccessOutOfBounds by elemPtrSlice
- checkAccessOutOfBounds by elemValArray
- checkAccessOutOfBounds by elemValSlice
- checkAliasingMemcpyArguments, checkMismatchedMemcpyArgumentLengths, and analyzeSlice2 by zirMemcpy
- checkArithmeticOverflow by addDivIntOverflowSafety
- checkArithmeticOverflow by analyzeArithmetic
- checkArithmeticOverflow by zirDivExact
- checkCastToEnumFromInvalid and panicReachedUnreachable by analyzeSwitchRuntimeBlock
- checkCastToEnumFromInvalid by zirEnumFromInt
- checkCastToEnumFromInvalid by zirSwitchBlock
- checkCastToEnumFromInvalid by zirTagName
- checkCastToErrorFromInvalid by zirErrorCast
- checkCastToErrorFromInvalid by zirErrorFromInt
- checkCastToIntFromInvalid by zirIntFromFloat
- checkCastToPointerFromInvalid by coerceCompatiblePtrs
- checkCastToPointerFromInvalid by ptrCastFull
- checkCastToPointerFromInvalid by zirPtrFromInt
- checkCastTruncatedData and checkCastToUnsignedFromNegative by intCast
- checkDivisionByZero by addDivByZeroSafety
- checkMismatchedForLoopCaptureLengths by zirForLen
- checkShiftAmountOverflow and checkArithmeticOverflow by zirShl
- checkShiftAmountOverflow and checkArithmeticOverflow by zirShr
- checkUnwrappedError by analyzeErrUnionPayloadPtr
- checkUnwrappedError by analyzeErrUnionPayload
- panicReachedUnreachable by Block.addUnreachable
- panicReachedUnreachable by analyzeCall
- panicUnwrappedError by maybeErrorUnwrap
- panicWithMsg by zirPanic
- analyzeSlice2 by zirSliceStart, zirSliceEnd, zirSliceSentinel, and zirSliceLength

Sema.RuntimeSafety

[x] Relocated primary definitions to be more appropriate, but retained check* preparation functions.
[x] Added helper functions:
- preparePanicCause
- failBlock
- preparePanicExtraType
- callBuiltinHelper
- wantPanicCause
- resolveArithOverflowedPanicImpl
- resolveArithOverflowedPanicCause
- abiSizeOfContainingDecl
- reinterpretLengthOfContainingDecl
[x] Added panic preparation functions:
- panicWithMsg
- panicReachedUnreachable
- panicCastToEnumFromInvalid
- panicUnwrappedError
- checkAccessNullValue
- checkUnwrappedError
- checkDivisionByZero
- checkAccessOutOfBounds
- checkAccessOutOfOrder
- checkAccessOutOfOrderExtra
- checkAliasingMemcpyArguments
- checkMismatchedMemcpyArgumentLengths
- checkMismatchedForLoopCaptureLengths
- checkAccessInactiveUnionField
- checkMismatchedSentinel
- checkMismatchedNullTerminator
- checkShiftAmountOverflow
- checkArithmeticOverflow
- checkCastToEnumFromInvalid
- checkCastToErrorFromInvalid
- checkCastToPointerFromInvalid
- checkCastToIntFromInvalid
- checkCastTruncatedData
- checkCastToUnsignedFromNegative
[x] Added primary functions:
- zirSliceStart
- zirSliceEnd
- zirSliceSentinel
- zirSliceLength
- analyzeSlice2

Module

[x] Eliminate unused items from reference cache, including those from the old implementation.
[x] Added cache of panic-related types and function references.

c

[x] Renamed panicNew to panic2.
[x] Added placeholder panicNew function to replace panic.

compiler_rt

[x] Renamed panicNew to panic2.
[x] Added placeholder panicNew function to replace panic.

main

[x] Removed controls listed below, as these required changes which exceeded the (already large) scope of this patch.
[x] Removed convenience/test flags -f[no-]runtime-safety and -f[no-]analyze-slice2 when testing is complete.
[x] Updated command line parser to parse convenience/test flags -f[no-]runtime-safety and -f[no-]analyze-slice2.
[x] Updated command line parser to parse controls for panic causes:
- -funwrapped-error
- -funwrapped-null
- -freturned-noreturn
- -freached-unreachable
- -faccessed-out-of-bounds
- -faccessed-out-of-order
- -faccessed-inactive-field
- -fdivided-by-zero
- -fmemcpy-argument-aliasing
- -fmismatched-memcpy-argument-lengths
- -fmismatched-for-loop-capture-lengths
- -fmismatched-sentinel
- -fshift-amt-overflowed
- -farith-overflowed (add_overflowed, sub_overflowed, inc_overflowed, dec_overflowed, div_overflowed, div_with_remainder)
- -fcast-truncated-data
- -fcast-to-enum-from-invalid
- -fcast-to-error-from-invalid
- -fcast-to-pointer-from-invalid
- -fcast-to-int-from-invalid
- -fcast-to-unsigned-from-negative

main.RuntimeSafety

[x] Removed namespace and helper functions related to test flags.
[x] Added helper functions:
- setAll
- parseSafety
- parseAllSafety

std.builtin

[x] Removed format from StackTrace.
[x] Added the standard library implementation for panicNew.
[x] Repurposed definition of default_panic.

std.debug

[x] Substituted helper functions with any existing analogous standard library functions.
[x] Added panic message writer functions for panic causes with context data.
[x] Renamed panicImpl to panicImplDefault, and replaced panicImpl with definition of former builtin.default_panic.
[x] Updated panicExtra to call panicImpl instead of builtin.panic.

std.math.nextafter

[x] Added workaround for C backend bug.

std.multi_array_list

[x] Rewrote MultiArrayList.Slice.items to avoid slice of comptime-known undefined pointer with runtime-known end operand.

test

[x] Added Sema.analyzeSlice2 runtime panic variants.
[x] Updated existing slice compile error tests.
[x] Added Sema.analyzeSlice2 compile error variants.
[x] Added Sema.analyzeSlice2 success variants.
[x] Restored ad hoc non-slice tests with new panic interface.
[x] Removed old runtime safety (panic) tests.

test.tests

[x] Renamed panicNew to panic2.
[x] Added definition for panicNew to Compiler Explorer program.

type

[x] Added function Type.allowSentinel for determining whether an aggregate of type ty permits sentinels.

Apr 25 '24 05:04 amp-59

For 7e617d23dd440762fc9fbb8801dfa38ecb898e1b, feel free to regress the RISC-V backend. I do not mind fixing it.

May 18 '24 04:05 Rexicon226

@andrewrk Is this PR on your radar? Is there anything that prevents this from getting merged that you can think of? Would be great to have this in master.

Jul 14 '24 09:07 yamashitax

Is this PR on your radar? Is there anything that prevents this from getting merged that you can think of? Would be great to have this in master.

@yamashitax This PR is effectively impossible to review and merge due to its size. This needs to be split up into smaller, incremental changes that can be merged individually. Especially the multiple changes to the semantics of the language made by this pull request should be split out into individual PRs so that they can be given proper consideration.

There seem to be lots of good fixes and improvements here that I would love to see land in upstream zig, I hope that @amp-59 or some other contributor is able to find a way to break this massive patch set into smaller chunks that are feasible for a core team member to review and merge.

Jul 15 '24 10:07 ifreund

Is this PR on your radar? Is there anything that prevents this from getting merged that you can think of? Would be great to have this in master.

@yamashitax This PR is effectively impossible to review and merge due to its size. This needs to be split up into smaller, incremental changes that can be merged individually.

@ifreund I wouldn’t say that the changes are that big. Most of this PR is tests, which I assume adds reassurance to a reviewer as opposed to it burdening them.

Especially the multiple changes to the semantics of the language made by this pull request should be split out into individual PRs so that they can be given proper consideration.

Could you elaborate and/or give examples on what you mean by “multiple changes to the semantics”? Is there anything aside from what is mentioned in “Changes to slice behaviour” that you are concerned about?

Jul 15 '24 10:07 yamashitax

Most of this PR is tests, which I assume adds reassurance to a reviewer as opposed to it burdening them.

Just for reference:

18,070 new lines in test/cases/compile_errors/slice_compile_error_variants.zig
9,831 new lines in test/cases/safety/slice_runtime_panic_variants.zig
3,199 new lines in test/behavior/slice2.zig

Total is 31,100 lines! What remains is just:

4,018 lines added + 2,158 lines removed,

which is fully in line with the expectations IMHO.

Jul 15 '24 16:07 BratishkaErik

In the interest of not wasting your time, @amp-59, I am closing this now. @ifreund's comment is exactly right.

Jul 15 '24 17:07 andrewrk

In the interest of not wasting your time, @amp-59, I am closing this now. @ifreund's comment is exactly right.

@andrewrk I would say that by closing this PR, you have effectively wasted @amp-59’s time. If it is indeed as you say and that a PR of this size will always be closed, then I would implore you to make that apparent somewhere here in the repository.

It saddens me that such a valuable contribution to the zig-project has been closed so abruptly. Personally I would still ask to have my questions stated above be at least commented on and addressed.

While I am aware of your self-proclaimed “almighty ruler”-status and that the final decision is yours to make, I would expect more of a conversation before a PR of this value be closed.

I urge you to reconsider and to start a conversation.

Jul 15 '24 17:07 yamashitax

Also, @BratishkaErik:

Just for reference:

* 18,070 new lines in test/cases/compile_errors/slice_compile_error_variants.zig

* 9,831 new lines in test/cases/safety/slice_runtime_panic_variants.zig

* 3,199 new lines in test/behavior/slice2.zig

These are obviously generated files, but the tool used to generate them is not even linked anywhere, there is no explanation of how they were generated or even that they were, and they look to be significantly over-testing. Most of these tests do not appear valuable. Far from reassuring a reviewer, this amount of unexplained generated code is a huge red flag.

Total is 31,100 lines! What remains is just:

4,018 lines added + 2,158 lines removed,

which is fully in line with the expectations IMHO.

Even this would be a big PR, requiring careful review, if from a highly trusted core contributor. From a non-core contributor, particularly one with a non-ideal PR history, it is a non-starter.

Jul 15 '24 17:07 mlugg

I don't get it, the PR is split into 154 independently reviewable commits. What's the issue?

Jul 15 '24 17:07 N00byEdge

Also, @BratishkaErik:
Just for reference:
* 18,070 new lines in test/cases/compile_errors/slice_compile_error_variants.zig

* 9,831 new lines in test/cases/safety/slice_runtime_panic_variants.zig

* 3,199 new lines in test/behavior/slice2.zig
These are obviously generated files, but the tool used to generate them is not even linked anywhere, there is no explanation of how they were generated or even that they were, and they look to be significantly over-testing. Most of these tests do not appear valuable. Far from reassuring a reviewer, this amount of unexplained generated code is a huge red flag.

This would be a good point to raise, I feel. If the core maintainers feel that the amount of tests is unnecessary or wish explanation for why there are so many, then wouldn’t it have been a good idea to start there?

Total is 31,100 lines! What remains is just:

4,018 lines added + 2,158 lines removed,

which is fully in line with the expectations IMHO.

Even this would be a big PR, requiring careful review, if from a highly trusted core contributor.

Not really sure what the distinction here would be between a “highly trusted” core contributor and a “regular” core contributor.

From a non-core contributor, particularly one with a non-ideal PR history, it is a non-starter.

You’ve completely lost me here. You start off by talking about core members reviewing this PR, but seem to end at a completely different place. Is “non-ideal PR history” in reference to core members who have made bad PRs? Or is that comment intended for OP?

It seems like you are saying that you want to discourage contributors who have not gotten >X% of their PRs accepted.

Would love a clarification here.

Jul 15 '24 17:07 yamashitax

I did not expect to see so much confusion. I will attempt to explain things patiently and clearly.

@yamashitax, you seem to have interpreted closing this PR as the final resting place for these code changes. In reality, the code is still there, and can be resubmitted, in a different form that is more amenable to being upstreamed.

I don't get it, the PR is split into 154 independently reviewable commits. What's the issue?

Firstly, they are not independently reviewable commits. If we take the first one alone, for example, it causes the build to fail because it deletes a function that is called from somewhere else. If you look at the last few, they are merge commits. This makes the entire branch needed to be reviewed as a unit.

More importantly, this PR does too many unrelated things. It lists 4 issues that it solves. It is not the case that it needs to solve 4 different issues at the same time, and the fact that it does that makes reviewing more difficult.

Finally, when reviewing is difficult, what I mean by that is that I have to invest time inspecting the patch for quality, because Zig users expect a certain standard of quality when they use Zig. In the past, @amp-59 has threatened to abandon the patch in response to reviews. This kind of volatility means that time invested in a review has a high chance of being wasted. Similarly, if the patch is enormous, it has more of a chance of having problems that prevent it from being completed. If a patch is abandoned, it means all the effort that went into reviewing it was wasted.

I have plenty of things to invest my time into, for example the other 179 open pull requests, or my own subprojects within zig that I want to work on personally. So what @amp-59 needs to do, if they want to be taken seriously, is to open a minimal pull request that does exactly one thing, and nothing more. In other words, exactly what @ifreund already said.

Jul 15 '24 18:07 andrewrk

Damn, Andrew posted his comment just as I was about to hit send. Still, I think this comment provides some more insight, so I'll post it nonetheless.

@yamashitax

If it is indeed as you say and that a PR of this size will always be closed, then I would implore you to make that apparent somewhere here in the repository.

I encourage you to find a single open-source project, literally anywhere, which would accept a 35k line PR from a third party. No project will do this, because it's simply too significant of a task: PR review is a huge undertaking, particularly in a big project. I'll elaborate on this later in this comment.

This would be a good point to raise, I feel. If the core maintainers feel that the amount of tests is unnecessary or wish explanation for why there are so many, then wouldn’t it have been a good idea to start there?

PRs like this erode faith in a contributor, and trying to have the contributor solve fundamental issues like this is a drain on reviewers' time. If that had been the only issue, perhaps we would have had a dialogue, but it's one of several issues in this PR, which have already been well-summarized.

It is up to contributors to provide all necessary context and information to make reviewing a PR a reasonable task.

Not really sure what the distinction here would be between a “highly trusted” core contributor and a “regular” core contributor.

Sorry, I worded that badly. The implication I tried (and failed) to make is that core contributors are highly trusted.

You’ve completely lost me here. You start off by talking about core members reviewing this PR, but seem to end at a completely different place.

I also worded this badly -- my apologies again. What I meant is that if a core contributor had created a PR of this size, it would still require careful review; and that if created by a contributor who is not on the core team (or has achieved a similar level of trust), it is not a reasonable size to review.

Is “non-ideal PR history” in reference to core members who have made bad PRs? Or is that comment intended for OP?

It seems like you are saying that you want to discourage contributors who have not gotten >X% of their PRs accepted.

That comment refers to OP here. I want to make it clear that I am not discouraging individuals who have had PRs rejected in the past from contributing to the Zig project. However, this history does mean that more care is required in review. Again, I'll address this in more depth in a moment.

@N00byEdge, what I am about to explain is also applicable to your comment.

PR review in a large project is a difficult and time-consuming job. Even considering core ("trusted") contributors reviewing the work of other core ("trusted") contributors, it requires pulling a large amount of context into your head from a large codebase, and using this knowledge to understand the wider implications of a change. Properly reviewing a 10 line PR from a third party can take half an hour if the piece of the compiler being touched is particularly important. Putting that level of effort into every review isn't reasonable for anyone to do, so to some extent, PR reviews are based on trust.

For a perfect example of this, consider the huge thread-safety changeset from @jacobly0 which was recently merged (#20528). If this change came from a third party of whom we had no knowledge, the core team would have much less trust in correctness of any part of the change, so we would have needed to use our maximum level of rigor, reviewing every line with the utmost care. For a 15k line diff, this is simply infeasible: the man-hours it would take to perform this review are on-par with the time it would take a trusted contributor to make the change themselves. If any issues are found, this raises legitimate doubts about the quality of the entire PR, and requires even more thorough review, potentially from contributors across the core team. So, if an untrusted contributor had opened that PR, it would have been immediately closed -- it is simply not worth the effort to review. However, because Jacob has proven on many occasions that he gives contributions of an exceedingly high quality, this level of effort in review is not necessary, because we (really, Andrew) trust that the author has done their due diligence; effectively doing the heavy lifting for us. Naturally, review is still necessary, but it is the difference between spending multiple full work days on a detailed review, and spending an hour understanding the broad ideas implemented and (more generally) perhaps looking at any parts the author flags up as dubious.

When a 35k line PR comes along from an untrusted contributor, there is no reasonable way to review it. Reviewing this PR would take longer than it would take for a core team member to do the work themselves; ultimately, when we are trying to create and ship a high-quality product, that just isn't worth our time. In fact, the same holds for a 5k line PR.

The situation becomes worse when the contributor who opened the PR has a track record of poor-quality contributions. Without any evidence that their relevant skills and knowledge have improved, the PR must be combed through incredibly delicately, as the reviewer is ultimately forced to expect incorrect code. Such a contributor is absolutely still welcome to contribute to the project, and build up the trust necessary for larger changes over time, but the way to do that is with smaller changes which demonstrate improved knowledge of the codebase being modified; not with monolithic PRs which try to do a lot of things at once with very large diffs. I genuinely apologise for using you as an example here @amp-59 (please nobody try to dig into the examples I give here; it's a form of attention nobody wants), but OP here has several PRs which were rejected or delayed for reasons of poor-quality code or arguing with reviewers. This doesn't mean they can't contribute to the project; but it does mean that larger changes are no longer worth the effort of review until the trust necessary is built up.

Splitting a large PR into many commits can help reviewers, but the issue is that some contributors will create commits which do not work in isolation. This isn't an instrinsically bad thing, it's just a commit style; I do it sometimes, as do all other core contributors I think! However, when we can't guarantee that commits work by themselves, reviewing work commit-by-commit isn't as simple as it sounds: we have to consider the context of surrounding commits, and note that issues we flag up towards the start of a review perhaps are randomly fixed later on (by which point, given 154 commits, the reviewer has inevitably forgotten the relevant context to that fix).

Suppose we could somehow be sure that each commit exists entirely in isolation, performing an atomic improvement to the compiler. Then, merging these commits into one PR is unhelpful for all parties involved. For reviewers, what would be perhaps a dozen relatively simple reviews turns into one prolonged task. The bugs we encounter along the way turns into a tangled mess of review comments which must be considered as a whole, because the fixes that come through will be entangled. Meanwhile, the PR gathers dust as reviews bounce back and forth; we can't partially merge a PR, so parts which might be okay in isolation inevitably bitrot. The reviewers and the contributor lose sight of what the PR is achieving, and ultimately, one party or the other is forced to deem the PR beyond repair and close it, wasting the time of all parties involved.

If commits are truly independently reviewable, then they are individually PR-able. A pull request should ideally achieve exactly one thing. Sometimes, changes are entangled enough that it makes more sense to combine them (so the commits cannot truly be placed or reviewed in isolation). In such cases, it is sometimes more logical to leave the work to a trusted contributor, who we can be more confident in doing the work correctly, resulting in a faster PR turnaround and fewer wasted man-hours from all parties.

The closure of this PR should not discourage OP from splitting it up into manageable and reviewable isolated changes. This allows focused energy to be put towards merging changes correctly and in a timely manner. However, this PR as a whole is unreviewable. Explaining all of this every time a PR is closed would eat into core team members' time which could be spent writing code or reviewing PRs, so is not typically worthwhile. I have made an exception here due to the amount of people who seem confused about why this has been closed.

Jul 15 '24 18:07 mlugg

Runtime safety (panic interface, slices)

Overview

New panic interface

Changes to the panic function signature

Exclusion of return address ?usize

Exclusion of stack trace pointer ?*builtin.StackTrace

All remaining callers call with null for both values

Comptime-known panic cause builtin.PanicCause

Panic examples from the compiler safety testsuite

Slice syntax

Differences compared to status quo

Slice operation slice_start does not preserve sentinel value for Many pointer operands

Slicing *T uses the same core logic and error messages as every other pointer operand

Casting from pointer-to-array types to slice types and from slice types to slice types is permitted by @ptrCast

Changes to slice behaviour

Added sentinel runtime safety check for compile-time-known pointer operands referencing runtime-known memory

Corrected computation of upper bound used by runtime safety check

Corrected computation of upper bound used by compile-time assertion

Substituted runtime crash with compile error for slice_start (with sentinel) for certain pointer operands

Added compile-time assertions for slices of reinterpreted memory for certain comptime pointer operands

Reduced variety of outcome when slicing compile-time-known undefined pointers, etc.

Implementation progress

Sema

Sema.RuntimeSafety

Module

c

compiler_rt

main

main.RuntimeSafety

std.builtin

std.debug

std.math.nextafter

std.multi_array_list

test

test.tests

type

Exclusion of return address `?usize`

Exclusion of stack trace pointer `?*builtin.StackTrace`

All remaining callers call with `null` for both values

Comptime-known panic cause `builtin.PanicCause`

Slice operation `slice_start` does not preserve sentinel value for `Many` pointer operands

Slicing `*T` uses the same core logic and error messages as every other pointer operand

Substituted runtime crash with compile error for `slice_start` (with sentinel) for certain pointer operands

Added compile-time assertions for slices of reinterpreted memory for certain `comptime` pointer operands