Odin
Odin copied to clipboard
Parapoly Types are type checked incorrectly due to race condition
- Operating System & Odin Version: Odin: dev-2023-04:b42bb5be OS: Windows 10 Home Basic (version: 21H2), build 19044.2728 CPU: AMD Ryzen 7 5800X 8-Core Processor RAM: 32711 MiB
The type checker will randomly fail with a parapoly type due to a race condition. An error will be reported that a field is missing. I can only reproduce this with parapoly types that are defined in separate procedures. Take the following minimal failure case:
package will_eventually_fail
Foo :: struct($N: int) {
a: u32,
}
bar :: proc(foo: ^Foo($N)) {
foo.a = 1
}
a :: proc() {
foo: Foo(1)
bar(&foo)
}
b :: proc() {
foo: Foo(1)
bar(&foo)
}
Running Odin check will pass majority of the time. Every so often (10-100 runs) I will get
will_eventually_fail.odin(8:5) 'foo' of type '^Foo(1)' has no field 'a'
foo.a = 1
^~^
Which is incorrect.
Now if I define Foo(1) in a and make b define Foo(2) instead then I never get the error (I have done 10k + runs).
package fine
Foo :: struct($N: int) {
a: u32,
}
bar :: proc(foo: ^Foo($N)) {
foo.a = 1
}
a :: proc() {
foo: Foo(1)
bar(&foo)
}
a :: proc() {
foo: Foo(2) // <-- the only change
bar(&foo)
}
My suspicion is that all the information to resolve Foo(1) is being done by one thread when another thread is checking it when the type information is half completed.
Looking a little more into it, the struct fields array is empty.
That is to say the struct type for Foo(1)
type->Struct.fields.count == 0
The problem is that there is a concurrency problem with checking structure fields. Similar errors have been reported in Issues #2679, #2814, #2949, #3200, and #3291.
Some of the other reported Issues have examples that are easier to reproduce than this one. However, the problem can be triggered a lot more frequently by adding a small delay in the compiler right before checking fields with this patch:
diff --git a/src/check_type.cpp b/src/check_type.cpp
index 0b9042905..65dccc880 100644
--- a/src/check_type.cpp
+++ b/src/check_type.cpp
@@ -660,6 +660,13 @@ gb_internal void check_struct_type(CheckerContext *ctx, Type *struct_type, Ast *
bool where_clause_ok = evaluate_where_clauses(ctx, node, ctx->scope, &st->where_clauses, true);
gb_unused(where_clause_ok);
}
+
+ {
+ const struct timespec delay = { 0, 10000000 };
+ struct timespec remain;
+ nanosleep(&delay, &remain);
+ }
+
check_struct_fields(ctx, node, &struct_type->Struct.fields, &struct_type->Struct.tags, st->fields, min_field_count, struct_type, context);
wait_signal_set(&struct_type->Struct.fields_wait_signal);
}
The root cause is that the fields_wait_signal futex that is not functional and also the futex, once fixed, needs to be used in more locations. The forthcoming PR resolves that problem.
Unfortunately, fixing the futex reveals a design flaw that locks up the compiler in some circumstances due to a type check loop, so that situation should be evaluated before merging the PR. The design flaw is somewhat of a separate issue so I'm going to file another Issue for that and will add another comment here when I have the Issue number. That issue is difficult to resolve so I think @gingerBill will need to evaluate what to do about that.
The new issue revealed by PR #3326 (which fixes this issue) is described in Issue #3327