Halide GPU memory model is poorly documented. Copy_to_device does nothing unless you set_host

GPU memory model is poorly documented. Copy_to_device does nothing unless you set_host_dirty.

Open SuTanTank opened this issue 5 years ago • 9 comments

I was following the tutorials and since my Windows PC doesn't has libJpeg or libPng, I decided to use OpenCV for image I/O. Then I learnt that in Halide, RGB buffer is default to be planar, and I checked out the test_interleaved.cpp, and write two small tools to convert between an interleaved buffer and a planar buffer. Which works fine in CPU version, like lesson_07.

However, when it came to lesson_12, the GPU part, the result is all 0 in the GPU version pipeline.

More specifically, I tried two strategies:

add two Func to convert a interleaved input the planar and a Func to convert back to interleaved at the beginning and the end of the original pipeline.
use two additional pipelines. The first one converts an interleaved buffer to a planar buffer first, with realize() called, then feed the planar buffer to the untouched original pipeline. Then add another pipeline to convert the planar result to an interleaved buffer and then create the corresponding cv::Mat form it.

These two strategies works fines on CPU. But on GPU, the first one (use two more Func) is too slow, so I tried the second strategy. However, the result is all 0.

I have tried copy_to_device() copy_to_host before and after the realize(), but it seems make no difference.

The function I use to convert buffer in below

template <class T>
Halide::Buffer<T> interleaved2planar(Halide::Buffer<T> in) {
    ImageParam src(in.type(), 3);
    Halide::Func plane("plane");
    Halide::Var x, y, c;

    plane(x, y, c) = src(x, y, c);

    src.dim(0).set_stride(3).dim(2).set_stride(1).set_bounds(0, 3);
    plane.output_buffer().dim(0).set_stride(1).dim(2).set_extent(3);

    plane.reorder(c, x, y).unroll(c);
    // plane.vectorize(x, 64);

    src.set(in);
    plane.compile_jit();

    auto out_buf = Halide::Buffer<T>(in.width(), in.height(), 3);
    plane.realize(out_buf);
    return out_buf;
}

and this is how I use it:

auto mat = cv::imread(data_root + "images/rgb.png");
Buffer<uint8_t> interleaved_in = Buffer<uint8_t>::make_interleaved(mat.data, mat.cols, mat.rows, mat.channels());
Buffer<uint8_t> input1 = interleaved2planar(interleaved_in);

Oct 22 '19 08:10 SuTanTank

Halide Halide copied to clipboard

GPU memory model is poorly documented. Copy_to_device does nothing unless you set_host_dirty.

Halide
Halide copied to clipboard