Halide
Halide copied to clipboard
Fix OpenGLCompute tests
Working towards fixing #5627. This fixes ~18 tests that were failing on master when using OpenGLCompute. Some tests merely needed to change the output when skipping, and others needed some trivial functionality.
Opening as draft since 4 tests are still failing, but hoping others can help look into those failures. Locally, these 4 fail:
correctness_vector_cast
(intermittently)
performance_async_gpu
(SEGFAULT)
generator_aot_gpu_multi_context_threaded
generator_aot_msan
So I'm not entirely sure what's going on, but while trying to debug some of these issues via a JIT test, it looks like our debug runtime isn't (reliably) used when jitting... i.e., if I insert something into halide_openglcompute_initialize_kernels()
that will fail for DEBUG_RUNTIME, then run (say) correctness_newtons_method with HL_JIT_TARGET=host-openglcompute-debug
, that failure won't occur; it seems that we aren't (always? ever?) using the debug runtime in this jit config.
EDIT: see also https://github.com/halide/Halide/pull/6399
I know OpenGL is deprecated on OSX, but do we expect it to work at all? On Big Sur, I fail reliably with Could not load function pointer for glDispatchCompute
. If this is something that may be fixable, great, but if we can't make OGLC work at all on recent OSX versions, perhaps we should fail more aggressively, e.g. when compiling for an OSX target, provide a stub runtime that fails instantly? Or a compile-time warning that you will not be going to space today?
I think we might want some kind of warning, but failing aggressively seems like overkill-- there's some 3rd party OpenGL(ES?) implementations available on macOS (see e.g. https://stackoverflow.com/questions/65802625/develop-using-opengl-4-x-on-osx-big-sur) that some dev may want to use.
I think we might want some kind of warning, but failing aggressively seems like overkill-- there's some 3rd party OpenGL(ES?) implementations available on macOS (see e.g. https://stackoverflow.com/questions/65802625/develop-using-opengl-4-x-on-osx-big-sur) that some dev may want to use.
Yeah, good point. (That said, it's clear that we can't support the 'stock' OGLC on OSX, since it stopped at GL4.1 but glDispatchCompute requires GL4.3.) What system(s) are you using for testing?
I'm using my personal Ubuntu 21.04 laptop, a Dell XPS13 from 2019, with Intel integrated graphics. I haven't tried using macOS, so my knowledge there is secondhand.
correctness_vector_cast
(intermittently)performance_async_gpu
(SEGFAULT)generator_aot_gpu_multi_context_threaded
generator_aot_msan
On my linux box, I don't get vector_gpu failures, but I do get correctness_math failures.
For performance_async_gpu
, the crash seems to be because we call halide_buffer_copy
with a src that has a null host and a null device_interface (which we don't properly check for):
halide_buffer_copy:
src buffer(0, 0x0, 0x0, 0, float32, {0, 800, 1}, {0, 800, 800}, {0, 1, 640000})
interface 0x0
dst buffer(0, 0x0, 0x7f1ad4185040, 0, float32, {0, 800, 1}, {0, 800, 800}, {0, 1, 640000})
Obviously the code should be more robust to this, but this is also a bad set of arguments...
but I do get correctness_math failures.
Can you post the output? I could probably fix it quickly (I had failures in that test, and thought I'd fixed them)
but I do get correctness_math failures.
Can you post the output? I could probably fix it quickly (I had failures in that test, and thought I'd fixed them)
relatively_equal failed for (inf, 0) with relative error nan For pow(0.00000000000000000000, -4.00000000000000000000) == inf from C and 0.00000000000000000000 from x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-jit-openglcompute-sse41.
For pow(0.00000000000000000000, -4.00000000000000000000) == inf from C and 0.00000000000000000000 from x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-jit-openglcompute-sse41.
Gah, isn't this a domain error in C? I can fix the implementation in OpenGLCompute to return inf if pow(x, y)
has x == 0 and y < 0 regardless.
So after looking at various issues (see my other recent PRs), I think that the performance_async_gpu
failure really boils down to the fact that the OPGLC backend doesn't implement device_crop. (Maybe it's not possible to implement it with this backend? Not sure.)
I think generator_aot_gpu_multi_context_threaded
is failing because the globals in the OGLC runtime are not at all threadsafe. Working on a fix.
EDIT: the branch srj/oglc-mutexed
attempts to do a pretty simple job of this, but it still fails with MapBufferRange
failing for reasons that aren't clear -- not sure if OGLC-specific or just another race condition?
EDIT #2: the branch srj/oglc-mutexed
is still not right (running under TSAN shows there are still race conditions); it looks like GPUCompilationCache
deals with this OK for other GPU backends, so perhaps a better solution is to adapt theOGLC backend to use that, instead of its home-grown globals; I'll leave that to Shoaib or someone else more familiar with this code.
Try running under TSAN under Linux to flush out more, e.g. HL_TARGET=host-opencl-debug-tsan make generator_aot_gpu_multi_context_threaded