Halide Fix OpenGLCompute tests

Working towards fixing #5627. This fixes ~18 tests that were failing on master when using OpenGLCompute. Some tests merely needed to change the output when skipping, and others needed some trivial functionality.

Opening as draft since 4 tests are still failing, but hoping others can help look into those failures. Locally, these 4 fail:

correctness_vector_cast (intermittently) performance_async_gpu (SEGFAULT) generator_aot_gpu_multi_context_threaded generator_aot_msan

Nov 05 '21 18:11 shoaibkamil

So I'm not entirely sure what's going on, but while trying to debug some of these issues via a JIT test, it looks like our debug runtime isn't (reliably) used when jitting... i.e., if I insert something into halide_openglcompute_initialize_kernels() that will fail for DEBUG_RUNTIME, then run (say) correctness_newtons_method with HL_JIT_TARGET=host-openglcompute-debug, that failure won't occur; it seems that we aren't (always? ever?) using the debug runtime in this jit config.

EDIT: see also https://github.com/halide/Halide/pull/6399

Nov 09 '21 01:11 steven-johnson

I know OpenGL is deprecated on OSX, but do we expect it to work at all? On Big Sur, I fail reliably with Could not load function pointer for glDispatchCompute. If this is something that may be fixable, great, but if we can't make OGLC work at all on recent OSX versions, perhaps we should fail more aggressively, e.g. when compiling for an OSX target, provide a stub runtime that fails instantly? Or a compile-time warning that you will not be going to space today?

Nov 09 '21 02:11 steven-johnson

I think we might want some kind of warning, but failing aggressively seems like overkill-- there's some 3rd party OpenGL(ES?) implementations available on macOS (see e.g. https://stackoverflow.com/questions/65802625/develop-using-opengl-4-x-on-osx-big-sur) that some dev may want to use.

Nov 09 '21 15:11 shoaibkamil

I think we might want some kind of warning, but failing aggressively seems like overkill-- there's some 3rd party OpenGL(ES?) implementations available on macOS (see e.g. https://stackoverflow.com/questions/65802625/develop-using-opengl-4-x-on-osx-big-sur) that some dev may want to use.

Yeah, good point. (That said, it's clear that we can't support the 'stock' OGLC on OSX, since it stopped at GL4.1 but glDispatchCompute requires GL4.3.) What system(s) are you using for testing?

Nov 09 '21 17:11 steven-johnson

I'm using my personal Ubuntu 21.04 laptop, a Dell XPS13 from 2019, with Intel integrated graphics. I haven't tried using macOS, so my knowledge there is secondhand.

Nov 09 '21 18:11 shoaibkamil

correctness_vector_cast (intermittently) performance_async_gpu (SEGFAULT) generator_aot_gpu_multi_context_threaded generator_aot_msan

On my linux box, I don't get vector_gpu failures, but I do get correctness_math failures.

Nov 09 '21 19:11 steven-johnson

For performance_async_gpu, the crash seems to be because we call halide_buffer_copy with a src that has a null host and a null device_interface (which we don't properly check for):

halide_buffer_copy:
 src buffer(0, 0x0, 0x0, 0, float32, {0, 800, 1}, {0, 800, 800}, {0, 1, 640000})
 interface 0x0
 dst buffer(0, 0x0, 0x7f1ad4185040, 0, float32, {0, 800, 1}, {0, 800, 800}, {0, 1, 640000})

Obviously the code should be more robust to this, but this is also a bad set of arguments...

Nov 09 '21 19:11 steven-johnson

but I do get correctness_math failures.

Can you post the output? I could probably fix it quickly (I had failures in that test, and thought I'd fixed them)

Nov 09 '21 20:11 shoaibkamil

but I do get correctness_math failures.

Can you post the output? I could probably fix it quickly (I had failures in that test, and thought I'd fixed them)

relatively_equal failed for (inf, 0) with relative error nan For pow(0.00000000000000000000, -4.00000000000000000000) == inf from C and 0.00000000000000000000 from x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-jit-openglcompute-sse41.

Nov 09 '21 21:11 steven-johnson

For pow(0.00000000000000000000, -4.00000000000000000000) == inf from C and 0.00000000000000000000 from x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-jit-openglcompute-sse41.

Gah, isn't this a domain error in C? I can fix the implementation in OpenGLCompute to return inf if pow(x, y) has x == 0 and y < 0 regardless.

Nov 09 '21 21:11 shoaibkamil

So after looking at various issues (see my other recent PRs), I think that the performance_async_gpu failure really boils down to the fact that the OPGLC backend doesn't implement device_crop. (Maybe it's not possible to implement it with this backend? Not sure.)

Nov 10 '21 00:11 steven-johnson

I think generator_aot_gpu_multi_context_threaded is failing because the globals in the OGLC runtime are not at all threadsafe. Working on a fix.

EDIT: the branch srj/oglc-mutexed attempts to do a pretty simple job of this, but it still fails with MapBufferRange failing for reasons that aren't clear -- not sure if OGLC-specific or just another race condition?

EDIT #2: the branch srj/oglc-mutexed is still not right (running under TSAN shows there are still race conditions); it looks like GPUCompilationCache deals with this OK for other GPU backends, so perhaps a better solution is to adapt theOGLC backend to use that, instead of its home-grown globals; I'll leave that to Shoaib or someone else more familiar with this code.

Nov 10 '21 01:11 steven-johnson

Try running under TSAN under Linux to flush out more, e.g. HL_TARGET=host-opencl-debug-tsan make generator_aot_gpu_multi_context_threaded

Nov 12 '21 00:11 steven-johnson

Halide Halide copied to clipboard

Fix OpenGLCompute tests

Halide
Halide copied to clipboard