Halide icon indicating copy to clipboard operation
Halide copied to clipboard

Exception at interleaved -> interleaved processing

Open Gobra opened this issue 6 years ago • 6 comments

Here's quite simple code that fails on the realize call with error message Constraint violated: f0.stride.0 (3) == 1 (1)

Halide::Var x, y, c;
Halide::Func processor;
auto input = Halide::Buffer<uint8_t>::make_interleaved((uint8_t *)bitmap.bits(), bitmap.width(), bitmap.height(), 3);
auto output = Halide::Buffer<uint8_t>::make_interleaved(bitmap.width(), bitmap.height(), 3);

processor(x, y, c) = input(x, y, c);
processor.realize(output);

To me the code makes perfect sense as both input and output are of the same dimensions and layout. I've tried "rendering" to the input filter with processor.realize(input); and it fails with the same error message.

Changing output to default (planar?) layout with auto output = Halide::Buffer<uint8_t>(bitmap.width(), bitmap.height(), 3); makes it working.

Is that a bug or did I miss something obvious in documentation regarding the data representation (layout)?

Should it be important - I'n running with Win10 x64, llvm 6.0, MSVC2017, Halide "master" branch from 21 june 2018.

Gobra avatar Jun 23 '18 17:06 Gobra

Take a look at http://halide-lang.org/tutorials/tutorial_lesson_16_rgb_generate.html

By default halide assumes the first dimension (x in this case) is dense in memory (stride 1). This is so it can generate good dense vector loads. You can tell it otherwise using processor.output_buffer().dim(0).set_stride(3)

abadams avatar Jun 23 '18 18:06 abadams

Yes, but I have created input buffer as make_interleaved, it shows stride 3 for x and 3072 for y (buffer width is 1024) which seems to be right. As output buffer is set the same way, why would Halide assume different stride for x from one defined for both input and target output?

Gobra avatar Jun 23 '18 18:06 Gobra

"processor" is compiled without knowledge of the output buffer you're going to use. You could call realize with any argument, and it would only compile once. I don't remember if this is also true of input buffers. I think when you use them directly perhaps Halide inspects the layout. If it were an ImageParam you'd have the same problem on the input as you do on the output.

abadams avatar Jun 23 '18 19:06 abadams

Thanks for your help, it makes sense and with your suggested fix it works now.

It's still quite confusing for a newbie and I'm sure a lot of programmer might fall to the same "trap". I've read tutorial 16 before posting the issue, but it also a bit confusing due to those differences between JIT and AOT schemes.

May I suggest maybe adding a page to the project wiki about buffer memory layouts and ways to treat it with Halide API? The most basic examples for JIT/AOT in addition to the tutorial 15 and 16 should be of big help to those only jumping onto the ship.

Gobra avatar Jun 23 '18 20:06 Gobra

I encountered the same problem when learning using Halide. I was trying to process a Buffer created from cv::Mat, which is interleaved. By adding output_buffer().dim(0).set_stride(3) to the Func the program runs normally.

My concern is, if the filter is assuming the first dimension's stride is 1, does it mean that changing it to other value may slow down the speed? or is there any method (like reorder maybe) can avoid the performance decreasing?

SuTanTank avatar Oct 19 '19 13:10 SuTanTank

I found that for a 3-channel interleaved image, reorder(c, x, y) cause significant slower process than doing nothing, which seem not making sense. Why following actual layout (y>x>c) is worse than (c>y>x) ?

SuTanTank avatar Oct 19 '19 18:10 SuTanTank