Steven Johnson
Steven Johnson
So after looking at various issues (see my other recent PRs), I think that the `performance_async_gpu` failure really boils down to the fact that the OPGLC backend doesn't implement device_crop....
I think `generator_aot_gpu_multi_context_threaded` is failing because the globals in the OGLC runtime are not at all threadsafe. Working on a fix. EDIT: the branch `srj/oglc-mutexed` attempts to do a pretty...
Try running under TSAN under Linux to flush out more, e.g. `HL_TARGET=host-opencl-debug-tsan make generator_aot_gpu_multi_context_threaded`
I like the copy-in-and-out mode idea. Should probably tie it in to ASAN mode so that testing happens "regularly" but in situations where we already expect degraded performance and memory...
Note that the test doesn't have any explicit vectorization -- are we getting autovectorization somehow?
bisect claims the injection point is: ``` commit 3131f808243abe3746280e016ab9459c14d9e53b Author: Mogball Date: Fri Apr 15 17:52:34 2022 +0000 [mlir] Refactor LICM into a utility LICM is refactored into a utility...
Monday Morning Review Ping -- where does this PR stand?
Huh, the windows failures are indeed odd -- and the error messages unhelpful. Could it be an illegal instruction on those machines? WinBot1 and 2 are pretty old.
This is weird: if I run `correctness_simd_op_check` in `cmd.exe`, I get normal output and no failures. If, otoh, I run it in the "git bash" shell, I get no output...
I did a litting debugging with WinDbg and eventually got a failure mode for simd_op_check that said: ``` HEAP[correctness_simd_op_check.exe]: HEAP: Free Heap block 00000144984FAFE0 modified at 00000144984FB058 after it was...