ponyc icon indicating copy to clipboard operation
ponyc copied to clipboard

Potential compiler segfault in lookup.c

Open stefandd opened this issue 2 years ago • 21 comments

https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L56-L68

This code is a bit bogus. If opt can be NULL, the declaration typecheck_t* t = &opt->check; could crash the compiler. In any case, it looks like the t declaration could be safely moved into the scope of the if-branch, where it is properly guarded against opt == NULL.

stefandd avatar Jun 23 '22 00:06 stefandd

Thanks for reporting this one. This should be a straightforward PR for anyone who wants to take it.

SeanTAllen avatar Jun 23 '22 00:06 SeanTAllen

Thanks for reporting this one. This should be a straightforward PR for anyone who wants to take it.

I am happy to submit a PR but wanted to not do it since there is a current PR (private types confusing error...) that involves this file, and I was worried that two edits of lookup.c are potentially floating around. If that is unfounded, I am happy to quickly create the PR

stefandd avatar Jun 23 '22 07:06 stefandd

The variable t is also used outside of the if scope below:

https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L150-L167

I actually don't know if opt is ever allowed to be NULL here, this function in particular is very liberal when it comes to dereferencing things inside of it (see all the instances of opt->check.errors).

Also, we have several pony_assert(opt != NULL); spread through the compiler, but it could be that there's some context in which opt can be NULL

ergl avatar Jun 23 '22 07:06 ergl

@ergl Should I close this?

stefandd avatar Jun 24 '22 14:06 stefandd

I don't think this should be closed.

I think we should remove the opt != NULL and could reasonably add an assert that it isn't null to the function.

SeanTAllen avatar Jun 28 '22 18:06 SeanTAllen

I don't think this should be closed.

I think we should remove the opt != NULL and could reasonably add an assert that it isn't null to the function.

I tried your suggestion exactly: https://github.com/ponylang/ponyc/commit/527864dd4e73d34df7d85cf1bb5e71ac5bafc020#diff-44ccf14c19a6015e168dc16ed55bd0fdcfd0bf88351427bbbe76449a9e257495L56-L64]

It does compile and work fine in Release mode but failed on all platforms in Debug mode with an assertion error.

It is unclear to me why, if opt is NULL in Debug mode during that call, the next line dereferencing opt hasn't crashed the current compiler in Debug mode

typecheck_t* t = &opt->check;

stefandd avatar Jul 27 '22 14:07 stefandd

We discussed this during sync - the next step would be to use a debugger like lldb to catch the assert fail and figure out what code path is calling a NULL opt, then fix it to not do that.

jemc avatar Aug 02 '22 18:08 jemc

Here is the backtrace from the compiler built in Debug mode with this patch applied:

C:\ponyc\src\libponyc\type\lookup.c:60: lookup_nominal: Assertion `opt != NULL` failed.

Backtrace:
  (ponyint_assert_fail) [00007FF61CCBD5A6]
  (lookup_nominal) [00007FF61CC3AD7A]
  (lookup_base) [00007FF61CC3A950]
  (lookup_try) [00007FF61CC3A7CA]
  (add_special) [00007FF61CC4C4D0]
  (add_nominal) [00007FF61CC4D413]
  (add_type) [00007FF61CC4A279]
  (reachable_method) [00007FF61CC4A351]
  (reach) [00007FF61CC49943]
  (genexe) [00007FF61CC66064]
  (codegen) [00007FF61CBDE8E7]
  (generate_passes) [00007FF61CB6F37C]
  (compile_package) [00007FF61CB633C1]
  (main) [00007FF61CB63714]
  (__scrt_common_main_seh) [00007FF61F32DD48]
  (BaseThreadInitThunk) [00007FFAD69D7034]
  (RtlUserThreadStart) [00007FFAD7DE2651]

stefandd avatar Aug 03 '22 10:08 stefandd

@jemc I think I found the offending call/ code path

https://github.com/ponylang/ponyc/blob/932d5cd19e09411d07092112a886be80a12cfbc2/src/libponyc/reach/reach.c#L563-L567 I think the newly added assertion is tripped by the call to lookup_try in add_special in reach.c where opt is set NULL!

From the design of the function it is likely it was written with the possibility of opt==NULL in mind, and removing the opt!=NULL checks and adding an assertion might not be the way forward.

However, if this call with opt NULL makes it way into lookup_nominal, why does it not currently crash the compiler (without the assertion) in either Debug or Release config when NULL is dereferend in the typecheck_t* t = &opt->check; line? https://github.com/ponylang/ponyc/blob/58134d152503147e7980404990252136dd7b6cc6/src/libponyc/type/lookup.c#L56-L68

stefandd avatar Aug 03 '22 10:08 stefandd

@stefandd - We already have the opt here in this scope (it's the last parameter received by add_special) so in my opinion there's no reason to pass NULL to lookup_try - we should pass the real opt.

jemc avatar Aug 26 '22 17:08 jemc

As to the question of how it's not currently crashing the compiler, I don't know and couldn't say further on that without some investigation.

jemc avatar Aug 26 '22 17:08 jemc

Unfortunately this definitely does seem more involved. In #4191, I tried first to only implement the suggestion (provide opt as parameter to the lookup_try call in add_special), but this failed to build, and then I tried to replace the other lookup_try(NULL, NULL,.. and lookup(NULL, NULL,.. calls in that module wherever opt was actually in the function scope, and this now still does not build.

Unsure how to proceed with this.

stefandd avatar Sep 18 '22 04:09 stefandd

I tried first to only implement the suggestion (provide opt as parameter to the lookup_try call in add_special), but this failed to build

Can you give more detail - maybe a compiler error output?

jemc avatar Sep 18 '22 04:09 jemc

It fails with the newly inserted assertion opt!=NULL being tripped. There is one lookup(NULL, NULL, t->ast, name) call left in reach.c, in add_method_name that could be causing this, but opt isn't in scope there, and I have no debugger access right now.

stefandd avatar Sep 18 '22 05:09 stefandd

To make it work, we need to add opt everywhere - if it's not in scope, we need to make sure it is in scope by adding it to the function signature and passing it in from the caller.

The compiler was written with the assumption that opt is always non-null, and any code that is supplying null for it needs to be replaced with code that supplies non-null.

jemc avatar Sep 18 '22 13:09 jemc

@jemc I did that, replacing the remaining two calls to lookup..(NULL, ...) by bringing opt into scope. Now the compiler actually crashes because lookup in genmatch.c returns NULL in eq_param_type, crashing the AST_GET_CHILDREN line.

static ast_t* eq_param_type(compile_t* c, ast_t* pattern)
{
  ast_t* pattern_type = deferred_reify(c->frame->reify, ast_type(pattern),
    c->opt);
  deferred_reification_t* fun = lookup(c->opt, pattern, pattern_type, c->str_eq);

  AST_GET_CHILDREN(fun->ast, cap, id, typeparams, params, result, partial);

The cause for the returned NULL is in lookup_nominal https://github.com/ponylang/ponyc/blame/24cf1efd9eab28d6ee2dbe8001919523c5a4864b/src/libponyc/type/lookup.c#L67-L77

Also, I know too little about the compiler to understand why ast_error(opt->check.errors, from, "can't lookup fields or methods " "on private types from other packages"); does not generate an error message at all ??? No textual output is generated and NULL is returned, causing the crash in eq_param_type.

Overall, for me it currently looks as if certain checks in lookup_nominal are suppressed by passing NULL for opt into the function. The attempt to now move into scope opt everywhere is likely breaking the assumption by whomever wrote these parts of the compiler that they can turn off certain checks by calling lookup functions with NULL.

stefandd avatar Sep 19 '22 11:09 stefandd

why ast_error does not generate an error message at all ??? No textual output is generated

The ast_error appends an error into the error list held at opt->check.errors. It does not halt the compiler immediately nor print an error immediately - it simply adds an error to be printed at the end.

Overall, for me it currently looks as if certain checks in lookup_nominal are suppressed by passing NULL for opt into the function, and that the attempt to move into scope opt everywhere is breaking the assumption by whoever wrote these parts of the compiler, that they can turn off certain checks by calling lookup functions with NULL.

I think your assessment is correct, but this was the wrong approach by whomever wrote this code - they should have probably passed in an extra flag argument called bool suppress_errors or something like that. They definitely shouldn't have made the opt argument nullable, because it's used for so much more than just accumulating errors, and because it's not meant to be nullable anywhere else.

jemc avatar Sep 19 '22 13:09 jemc

It seems then that resolving this is a bit of a rewrite of certain parts of the compiler which is a task better left to one of the core developers.

To summarize the situation:

  • in lookup_nominal (code of the first post): if an assert(opt != NULL) is inserted, it is tripped by quite a few calls of the style lookup..(NULL, NULL, ..) in reach.c and genmatch.c. This means that the typecheck_t* t = &opt->check; line in lookup_nominal accesses invalid memory in such cases (since the NULL object is not an opt struct). Despite that, it does currently not lead to any reported crashes. There are various (opt != NULL) guards in that function that were likely inserted to allow such naughty lookup..(NULL, NULL, ..) calls.

  • a remedy seemed to be to bring opt into scope/and use it to replace all lookup..(NULL, NULL, ..) calls with lookup..(opt, NULL, ..) calls and to remove the (opt != NULL) guards in lookup_nominal. This however lead to more trouble (crashes) since the function was apparently written with such opt == NULL use in mind.

One way forward would simply be to slightly alter lookup_nominal to prevent any invalid memory access by e.g. moving the declaration typecheck_t* t = &opt->check; within the guarded if scopes.

Another way forward is to continue to make the changes as suggested by @jemc and @SeanTAllen which would however need significant modifications to lookup_nominal in order to prevent spurious error messages to be generated (which were previously guarded against with the (opt != NULL) guards.

Please advise on how to proceed!

stefandd avatar Sep 20 '22 10:09 stefandd

My suggested approach is still the same as in my previous comment - that the code should be updated in whatever was is necessary to make it no longer assume that opt can sometimes be null - probably by passing in a new flag argument indicating whether errors should be suppressed or not (rather than treating a null opt as an indication to suppress errors).

jemc avatar Sep 20 '22 15:09 jemc

Ok, I can do this but would have to assume that all guarding (opt != NULL) in lookup_nominal were meant to serve as an error suppression flag and can instead by replaced by !flagXXX -- safe assumption?

stefandd avatar Sep 20 '22 15:09 stefandd

Yes, I think so.

jemc avatar Sep 20 '22 15:09 jemc