Steven Johnson

Results 458 comments of Steven Johnson

> These days I think it usually works better to vectorize multiple vars than to fuse two vars and vectorize the fused result. But will that work reasonably well when...

> The thing that's making a mess in the first case is you're fusing an outer var with an inner var. The current order of the vars matters in fuse()...

Oooh, good idea. Unfortunately, this still fails: input_pixel.store_in(MemoryType::Register) .bound(c, 0, input_.channels()) .bound_extent(c, input_.channels()) .reorder(c, x, y) .reorder_storage(c, x, y) .fuse(c, x, cx) .split(cx, cxo, cxi, pixels_per_vec) .vectorize(cxi) .compute_at(luma, xo); input_pixel.specialize(input_.channels()...

Aha: moving the .specialize() calls higher up (e.g. to `luma`) does the trick. Awesome!

I'm going to convert this to draft (but leave it open), since it seems unlikely to be of interest any time soon. (I'll harvest a couple of minor changes to...

Is this `if` statement likely to be optimized away by LLVM (ie can it be proven always true or false)?

So where does this stand -- is it affecting x86 only, or is it more general? Do we need a revert? An urgent fix?

Note: I accidentally omitted the stride specification for input that indicates it is dense: input.dim(1).set_stride(input.dim(2).extent() * 3); ...but adding that doesn't change either observed weirdness.

Step 1: please run `run-clang-format.sh` and `run-clang-tidy.sh` on these and fix the issues :-)