Odin icon indicating copy to clipboard operation
Odin copied to clipboard

Parapoly Types are type checked incorrectly due to race condition

Open Lperlind opened this issue 2 years ago • 1 comments

  • Operating System & Odin Version: Odin: dev-2023-04:b42bb5be OS: Windows 10 Home Basic (version: 21H2), build 19044.2728 CPU: AMD Ryzen 7 5800X 8-Core Processor RAM: 32711 MiB

The type checker will randomly fail with a parapoly type due to a race condition. An error will be reported that a field is missing. I can only reproduce this with parapoly types that are defined in separate procedures. Take the following minimal failure case:

package will_eventually_fail

Foo :: struct($N: int) {
    a: u32,
}

bar :: proc(foo: ^Foo($N)) {
    foo.a = 1
}

a :: proc() {
	foo: Foo(1)
	bar(&foo)
}

b :: proc() {
	foo: Foo(1)
	bar(&foo)
}

Running Odin check will pass majority of the time. Every so often (10-100 runs) I will get

will_eventually_fail.odin(8:5) 'foo' of type '^Foo(1)' has no field 'a'
        foo.a = 1
        ^~^

Which is incorrect. Now if I define Foo(1) in a and make b define Foo(2) instead then I never get the error (I have done 10k + runs).

package fine

Foo :: struct($N: int) {
    a: u32,
}

bar :: proc(foo: ^Foo($N)) {
    foo.a = 1
}

a :: proc() {
	foo: Foo(1)
	bar(&foo)
}

a :: proc() {
	foo: Foo(2) // <-- the only change
	bar(&foo)
}

My suspicion is that all the information to resolve Foo(1) is being done by one thread when another thread is checking it when the type information is half completed.

Lperlind avatar Apr 16 '23 10:04 Lperlind

Looking a little more into it, the struct fields array is empty. That is to say the struct type for Foo(1)

type->Struct.fields.count == 0

Lperlind avatar Apr 16 '23 11:04 Lperlind

The problem is that there is a concurrency problem with checking structure fields. Similar errors have been reported in Issues #2679, #2814, #2949, #3200, and #3291.

Some of the other reported Issues have examples that are easier to reproduce than this one. However, the problem can be triggered a lot more frequently by adding a small delay in the compiler right before checking fields with this patch:

diff --git a/src/check_type.cpp b/src/check_type.cpp
index 0b9042905..65dccc880 100644
--- a/src/check_type.cpp
+++ b/src/check_type.cpp
@@ -660,6 +660,13 @@ gb_internal void check_struct_type(CheckerContext *ctx, Type *struct_type, Ast *
            bool where_clause_ok = evaluate_where_clauses(ctx, node, ctx->scope, &st->where_clauses, true);
            gb_unused(where_clause_ok);
        }
+
+       {
+           const struct timespec delay = { 0, 10000000 };
+           struct timespec remain;
+           nanosleep(&delay, &remain);
+       }
+
        check_struct_fields(ctx, node, &struct_type->Struct.fields, &struct_type->Struct.tags, st->fields, min_field_count, struct_type, context);
        wait_signal_set(&struct_type->Struct.fields_wait_signal);
    }

The root cause is that the fields_wait_signal futex that is not functional and also the futex, once fixed, needs to be used in more locations. The forthcoming PR resolves that problem.

Unfortunately, fixing the futex reveals a design flaw that locks up the compiler in some circumstances due to a type check loop, so that situation should be evaluated before merging the PR. The design flaw is somewhat of a separate issue so I'm going to file another Issue for that and will add another comment here when I have the Issue number. That issue is difficult to resolve so I think @gingerBill will need to evaluate what to do about that.

rick-masters avatar Mar 24 '24 16:03 rick-masters

The new issue revealed by PR #3326 (which fixes this issue) is described in Issue #3327

rick-masters avatar Mar 24 '24 17:03 rick-masters