MCDC icon indicating copy to clipboard operation
MCDC copied to clipboard

GPU compilation error detection or troubleshooting

Open ilhamv opened this issue 9 months ago • 6 comments

When searching for GPU-compatibility bugs in #300, I found that the bugs are actually quite trivial:

  • the use of default function argument (Numba currently does not support this), and
  • the use of Numpy functions.

However, the error messages that show up do not clearly indicate what happened, so we had to manually hunt for possible issues from the changed code line-by-line.

It would be very helpful in debugging in the future if we could detect these identified issues during the compilation and accordingly report them in the error message.

Or, creating a list of frequently showing-up bugs for troubleshooting may be good enough.

What do you think? @braxtoncuneo @jpmorgan98 @clemekay

ilhamv avatar Mar 17 '25 20:03 ilhamv

If we are adding in the trace decorator, we could add in checks for default arguments through function inspection, and raise an error if a default argument is found.

If we want something that doesn't use decorators, it could be accomplished by inspecting functions for the functions that they call and recursively searching for default-argument-using functions.

braxtoncuneo avatar Mar 17 '25 20:03 braxtoncuneo

I think having a list of known/frequently-appearing bugs is a good idea, perhaps on the website? It's not exactly a robust solution, but at the very least, it gives someone debugging a place to start.

@braxtoncuneo is there any downside to adding in the trace decorator? If not, I think that sounds like a nice solution.

Also, hopefully modularizing the code will help with tracking down changes

clemekay avatar Mar 17 '25 22:03 clemekay

In fact, a few weeks ago I asked Braxton about anything I should keep in mind or look out for while developing to make sure I'm not causing GPU issues; maybe we should have a general page for "Seemingly innocuous things that nevertheless break gpu compatibility".

It could be a list of things that developers wouldn't necessarily think to avoid because they won't cause CPU issues but that we know will cause GPU issues, such as the use of some Numpy functions or passing objects outside of arrays.

It can then double as a place to start when trying to debug GPU issues!

clemekay avatar Mar 17 '25 22:03 clemekay

If these bugs have been fixed does that mean we are ready for v0.12.0?

jpmorgan98 avatar Mar 18 '25 20:03 jpmorgan98

@jpmorgan98 As it stands, the bug fix PR was merged in yesterday #310, but the GPU regression tests for it failed.

clemekay avatar Mar 18 '25 22:03 clemekay

Yes, the GitHub workflow run for the GPU regression test failed. Is that expected, @jpmorgan98? I have confirmed that the GPU regression test passes (running from our OSU CI machine), though.

ilhamv avatar Mar 19 '25 02:03 ilhamv