Steven Johnson comments

Results 458 comments of


                                            Steven Johnson

Vectorizing interleaved RGBA/RGB images is too hard

> These days I think it usually works better to vectorize multiple vars than to fuse two vars and vectorize the fused result. But will that work reasonably well when...

Vectorizing interleaved RGBA/RGB images is too hard

> The thing that's making a mess in the first case is you're fusing an outer var with an inner var. The current order of the vars matters in fuse()...

Vectorizing interleaved RGBA/RGB images is too hard

Oooh, good idea. Unfortunately, this still fails: input_pixel.store_in(MemoryType::Register) .bound(c, 0, input_.channels()) .bound_extent(c, input_.channels()) .reorder(c, x, y) .reorder_storage(c, x, y) .fuse(c, x, cx) .split(cx, cxo, cxi, pixels_per_vec) .vectorize(cxi) .compute_at(luma, xo); input_pixel.specialize(input_.channels()...

Vectorizing interleaved RGBA/RGB images is too hard

Aha: moving the .specialize() calls higher up (e.g. to `luma`) does the trick. Awesome!

Add string names to all Vars so pseudo-code output is more human-readable.

Is this PR still active? Should it be closed?

Create helper script to run IWYU on Halide

I'm going to convert this to draft (but leave it open), since it seems unlikely to be of interest any time soon. (I'll harvest a couple of minor changes to...

Staging strided access to input buffer in a guardwithif generates a pointless if statement

Is this `if` statement likely to be optimized away by LLVM (ie can it be proven always true or false)?

simple strided access to input buffers generates terrible asm

So where does this stand -- is it affecting x86 only, or is it more general? Do we need a revert? An urgent fix?

Applying .split() to .fuse()'d vars produces strange results

Note: I accidentally omitted the stride specification for input that indicates it is dense: input.dim(1).set_stride(input.dim(2).extent() * 3); ...but adding that doesn't change either observed weirdness.

Add GPU autoscheduler

Step 1: please run `run-clang-format.sh` and `run-clang-tidy.sh` on these and fix the issues :-)