Halide
Halide copied to clipboard
Incorrect GPU codegen with multiple different types stored in memory
The GPU code generated for the below test produces incorrect results on the OpenCL and Metal backends (and the WIP WebGPU backend), and may well on others too. The test is derived from correctness/gpu_mixed_shared_mem_types
, but uses the same MemoryType
for all of the buffers. Changing the memory type for all of the buffers from Heap
to GPUShared
also produces incorrect results, but alternating between Heap
and GPUShared
produces correct results (and this is what gpu_mixed_shared_mem_types
does). Using Stack
instead also fixes the issue, as does using the same data type for all three buffers.
Running it with the OpenCL backend going through Oclgrind reveals both out-of-bounds accesses and data races.
This may be related to #4967, but I'm not familiar enough with this stuff to say for sure.
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
int main(int argc, char **argv) {
Var x("x"), xi("xi");
Func out("out");
Func a, b, c;
a(x) = cast(UInt(8), x);
b(x) = cast(UInt(16), x);
c(x) = cast(UInt(8), x);
a.compute_at(out, x).store_in(MemoryType::Heap).gpu_threads(x);
b.compute_at(out, x).store_in(MemoryType::Heap).gpu_threads(x);
c.compute_at(out, x).store_in(MemoryType::Heap).gpu_threads(x);
out(x) = cast(UInt(32), a(x) + b(x) + c(x));
out.gpu_tile(x, xi, 1);
Buffer<uint32_t> output = out.realize({10});
for (int x = 0; x < output.width(); x++) {
uint32_t ref = 3 * x;
if (output(x) != ref) {
printf("FAILED: output(%d) = %d != %d\n", x, output(x), ref);
return 1;
}
}
printf("Success!\n");
return 0;
}