ponyc
ponyc copied to clipboard
Potential compiler segfault in lookup.c
https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L56-L68
This code is a bit bogus. If opt can be NULL, the declaration typecheck_t* t = &opt->check;
could crash the compiler. In any case, it looks like the t declaration could be safely moved into the scope of the if-branch, where it is properly guarded against opt == NULL.
Thanks for reporting this one. This should be a straightforward PR for anyone who wants to take it.
Thanks for reporting this one. This should be a straightforward PR for anyone who wants to take it.
I am happy to submit a PR but wanted to not do it since there is a current PR (private types confusing error...) that involves this file, and I was worried that two edits of lookup.c are potentially floating around. If that is unfounded, I am happy to quickly create the PR
The variable t
is also used outside of the if
scope below:
https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L150-L167
I actually don't know if opt
is ever allowed to be NULL
here, this function in particular is very liberal when it comes to dereferencing things inside of it (see all the instances of opt->check.errors
).
Also, we have several pony_assert(opt != NULL);
spread through the compiler, but it could be that there's some context in which opt can be NULL
@ergl Should I close this?
I don't think this should be closed.
I think we should remove the opt != NULL and could reasonably add an assert that it isn't null to the function.
I don't think this should be closed.
I think we should remove the opt != NULL and could reasonably add an assert that it isn't null to the function.
I tried your suggestion exactly: https://github.com/ponylang/ponyc/commit/527864dd4e73d34df7d85cf1bb5e71ac5bafc020#diff-44ccf14c19a6015e168dc16ed55bd0fdcfd0bf88351427bbbe76449a9e257495L56-L64]
It does compile and work fine in Release mode but failed on all platforms in Debug mode with an assertion error.
It is unclear to me why, if opt is NULL in Debug mode during that call, the next line dereferencing opt hasn't crashed the current compiler in Debug mode
typecheck_t* t = &opt->check;
We discussed this during sync - the next step would be to use a debugger like lldb to catch the assert fail and figure out what code path is calling a NULL opt
, then fix it to not do that.
Here is the backtrace from the compiler built in Debug mode with this patch applied:
C:\ponyc\src\libponyc\type\lookup.c:60: lookup_nominal: Assertion `opt != NULL` failed.
Backtrace:
(ponyint_assert_fail) [00007FF61CCBD5A6]
(lookup_nominal) [00007FF61CC3AD7A]
(lookup_base) [00007FF61CC3A950]
(lookup_try) [00007FF61CC3A7CA]
(add_special) [00007FF61CC4C4D0]
(add_nominal) [00007FF61CC4D413]
(add_type) [00007FF61CC4A279]
(reachable_method) [00007FF61CC4A351]
(reach) [00007FF61CC49943]
(genexe) [00007FF61CC66064]
(codegen) [00007FF61CBDE8E7]
(generate_passes) [00007FF61CB6F37C]
(compile_package) [00007FF61CB633C1]
(main) [00007FF61CB63714]
(__scrt_common_main_seh) [00007FF61F32DD48]
(BaseThreadInitThunk) [00007FFAD69D7034]
(RtlUserThreadStart) [00007FFAD7DE2651]
@jemc I think I found the offending call/ code path
https://github.com/ponylang/ponyc/blob/932d5cd19e09411d07092112a886be80a12cfbc2/src/libponyc/reach/reach.c#L563-L567
I think the newly added assertion is tripped by the call to lookup_try
in add_special
in reach.c where opt is set NULL!
From the design of the function it is likely it was written with the possibility of opt==NULL
in mind, and removing the opt!=NULL checks and adding an assertion might not be the way forward.
However, if this call with opt NULL makes it way into lookup_nominal, why does it not currently crash the compiler (without the assertion) in either Debug or Release config when NULL is dereferend in the typecheck_t* t = &opt->check;
line?
https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L56-L68
@stefandd - We already have the opt
here in this scope (it's the last parameter received by add_special
) so in my opinion there's no reason to pass NULL
to lookup_try
- we should pass the real opt
.
As to the question of how it's not currently crashing the compiler, I don't know and couldn't say further on that without some investigation.
Unfortunately this definitely does seem more involved. In #4191, I tried first to only implement the suggestion (provide opt as parameter to the lookup_try
call in add_special
), but this failed to build, and then I tried to replace the other lookup_try(NULL, NULL,..
and lookup(NULL, NULL,..
calls in that module wherever opt
was actually in the function scope, and this now still does not build.
Unsure how to proceed with this.
I tried first to only implement the suggestion (provide opt as parameter to the lookup_try call in add_special), but this failed to build
Can you give more detail - maybe a compiler error output?
It fails with the newly inserted assertion opt!=NULL being tripped. There is one lookup(NULL, NULL, t->ast, name)
call left in reach.c
, in add_method_name
that could be causing this, but opt isn't in scope there, and I have no debugger access right now.
To make it work, we need to add opt
everywhere - if it's not in scope, we need to make sure it is in scope by adding it to the function signature and passing it in from the caller.
The compiler was written with the assumption that opt
is always non-null, and any code that is supplying null for it needs to be replaced with code that supplies non-null.
@jemc
I did that, replacing the remaining two calls to lookup..(NULL, ...)
by bringing opt into scope. Now the compiler actually crashes because lookup in genmatch.c returns NULL in eq_param_type
, crashing the AST_GET_CHILDREN line.
static ast_t* eq_param_type(compile_t* c, ast_t* pattern)
{
ast_t* pattern_type = deferred_reify(c->frame->reify, ast_type(pattern),
c->opt);
deferred_reification_t* fun = lookup(c->opt, pattern, pattern_type, c->str_eq);
AST_GET_CHILDREN(fun->ast, cap, id, typeparams, params, result, partial);
The cause for the returned NULL is in lookup_nominal
https://github.com/ponylang/ponyc/blame/24cf1efd9eab28d6ee2dbe8001919523c5a4864b/src/libponyc/type/lookup.c#L67-L77
Also, I know too little about the compiler to understand why ast_error(opt->check.errors, from, "can't lookup fields or methods " "on private types from other packages");
does not generate an error message at all ??? No textual output is generated and NULL is returned, causing the crash in eq_param_type
.
Overall, for me it currently looks as if certain checks in lookup_nominal are suppressed by passing NULL
for opt
into the function. The attempt to now move into scope opt
everywhere is likely breaking the assumption by whomever wrote these parts of the compiler that they can turn off certain checks by calling lookup functions with NULL.
why
ast_error
does not generate an error message at all ??? No textual output is generated
The ast_error
appends an error into the error list held at opt->check.errors
. It does not halt the compiler immediately nor print an error immediately - it simply adds an error to be printed at the end.
Overall, for me it currently looks as if certain checks in
lookup_nominal
are suppressed by passingNULL
foropt
into the function, and that the attempt to move into scopeopt
everywhere is breaking the assumption by whoever wrote these parts of the compiler, that they can turn off certain checks by calling lookup functions withNULL
.
I think your assessment is correct, but this was the wrong approach by whomever wrote this code - they should have probably passed in an extra flag argument called bool suppress_errors
or something like that. They definitely shouldn't have made the opt
argument nullable, because it's used for so much more than just accumulating errors, and because it's not meant to be nullable anywhere else.
It seems then that resolving this is a bit of a rewrite of certain parts of the compiler which is a task better left to one of the core developers.
To summarize the situation:
-
in
lookup_nominal
(code of the first post): if anassert(opt != NULL)
is inserted, it is tripped by quite a few calls of the stylelookup..(NULL, NULL, ..)
in reach.c and genmatch.c. This means that thetypecheck_t* t = &opt->check;
line inlookup_nominal
accesses invalid memory in such cases (since theNULL
object is not an opt struct). Despite that, it does currently not lead to any reported crashes. There are various(opt != NULL)
guards in that function that were likely inserted to allow such naughtylookup..(NULL, NULL, ..)
calls. -
a remedy seemed to be to bring
opt
into scope/and use it to replace alllookup..(NULL, NULL, ..)
calls withlookup..(opt, NULL, ..)
calls and to remove the(opt != NULL)
guards inlookup_nominal
. This however lead to more trouble (crashes) since the function was apparently written with suchopt == NULL
use in mind.
One way forward would simply be to slightly alter lookup_nominal
to prevent any invalid memory access by e.g. moving the declaration typecheck_t* t = &opt->check;
within the guarded if scopes.
Another way forward is to continue to make the changes as suggested by @jemc and @SeanTAllen which would however need significant modifications to lookup_nominal
in order to prevent spurious error messages to be generated (which were previously guarded against with the (opt != NULL)
guards.
Please advise on how to proceed!
My suggested approach is still the same as in my previous comment - that the code should be updated in whatever was is necessary to make it no longer assume that opt
can sometimes be null - probably by passing in a new flag argument indicating whether errors should be suppressed or not (rather than treating a null opt
as an indication to suppress errors).
Ok, I can do this but would have to assume that all guarding (opt != NULL)
in lookup_nominal
were meant to serve as an error suppression flag and can instead by replaced by !flagXXX
-- safe assumption?
Yes, I think so.