Jason Ansel

Results 199 comments of Jason Ansel

Thanks! That is a neat trick with the size==1 rdom. I'll try mapping load masks to that.

Thanks, that works though on GPU I get warnings ``` Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped...

Using that method of masking loads causes different autoscheduler issues in `aten.resize()` kernels: repro.py ```py import halide as hl @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 1) out_ptr0 = hl.OutputBuffer(hl.Float(32), 1)...

Here is another (larger) example of this same error: ```py import halide as hl from math import inf, nan @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 2) in_ptr1 = hl.InputBuffer(hl.Int(64), 1)...

Run with both `debug` target and `HL_DEBUG_CODEGEN=1`: ``` $ HL_DEBUG_CODEGEN=1 CUDA_LAUNCH_BLOCKING=1 python test/inductor/test_halide.py -k test_pow3_cuda Failed to load binary:python JIT compiling shared runtime for x86-64-linux-avx-avx2-avx512-avx512_cannonlake-avx512_skylake-cuda-f16c-fma-jit-sse41 JIT compiling cuda for x86-64-linux-avx-avx2-avx512-avx512_cannonlake-avx512_skylake-cuda-f16c-fma-jit-sse41...