Halide
Halide copied to clipboard
Exception at interleaved -> interleaved processing
Here's quite simple code that fails on the realize
call with error message Constraint violated: f0.stride.0 (3) == 1 (1)
Halide::Var x, y, c;
Halide::Func processor;
auto input = Halide::Buffer<uint8_t>::make_interleaved((uint8_t *)bitmap.bits(), bitmap.width(), bitmap.height(), 3);
auto output = Halide::Buffer<uint8_t>::make_interleaved(bitmap.width(), bitmap.height(), 3);
processor(x, y, c) = input(x, y, c);
processor.realize(output);
To me the code makes perfect sense as both input and output are of the same dimensions and layout.
I've tried "rendering" to the input filter with processor.realize(input);
and it fails with the same error message.
Changing output to default (planar?) layout with auto output = Halide::Buffer<uint8_t>(bitmap.width(), bitmap.height(), 3);
makes it working.
Is that a bug or did I miss something obvious in documentation regarding the data representation (layout)?
Should it be important - I'n running with Win10 x64, llvm 6.0, MSVC2017, Halide "master" branch from 21 june 2018.
Take a look at http://halide-lang.org/tutorials/tutorial_lesson_16_rgb_generate.html
By default halide assumes the first dimension (x in this case) is dense in memory (stride 1). This is so it can generate good dense vector loads. You can tell it otherwise using processor.output_buffer().dim(0).set_stride(3)
Yes, but I have created input buffer as make_interleaved
, it shows stride 3 for x and 3072 for y (buffer width is 1024) which seems to be right. As output buffer is set the same way, why would Halide assume different stride for x from one defined for both input and target output?
"processor" is compiled without knowledge of the output buffer you're going to use. You could call realize with any argument, and it would only compile once. I don't remember if this is also true of input buffers. I think when you use them directly perhaps Halide inspects the layout. If it were an ImageParam you'd have the same problem on the input as you do on the output.
Thanks for your help, it makes sense and with your suggested fix it works now.
It's still quite confusing for a newbie and I'm sure a lot of programmer might fall to the same "trap". I've read tutorial 16 before posting the issue, but it also a bit confusing due to those differences between JIT and AOT schemes.
May I suggest maybe adding a page to the project wiki about buffer memory layouts and ways to treat it with Halide API? The most basic examples for JIT/AOT in addition to the tutorial 15 and 16 should be of big help to those only jumping onto the ship.
I encountered the same problem when learning using Halide.
I was trying to process a Buffer created from cv::Mat, which is interleaved.
By adding output_buffer().dim(0).set_stride(3)
to the Func the program runs normally.
My concern is, if the filter is assuming the first dimension's stride is 1, does it mean that changing it to other value may slow down the speed? or is there any method (like reorder
maybe) can avoid the performance decreasing?
I found that for a 3-channel interleaved image, reorder(c, x, y)
cause significant slower process than doing nothing, which seem not making sense. Why following actual layout (y>x>c) is worse than (c>y>x) ?