Jason Ansel comments

Results 199 comments of


                                            Jason Ansel

Autoschedulers fail on indirect loads

Thanks! That is a neat trick with the size==1 rdom. I'll try mapping load masks to that.

Autoschedulers fail on indirect loads

Thanks, that works though on GPU I get warnings ``` Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped Warning: Unhandled intrinsic call unsafe_promise_clamped...

Autoschedulers fail on indirect loads

Using that method of masking loads causes different autoscheduler issues in `aten.resize()` kernels: repro.py ```py import halide as hl @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 1) out_ptr0 = hl.OutputBuffer(hl.Float(32), 1)...

Anderson2021 autoscheduler fails with: Condition failed: at_or_inside_block()

Here is another (larger) example of this same error: ```py import halide as hl from math import inf, nan @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 2) in_ptr1 = hl.InputBuffer(hl.Int(64), 1)...

[inductor] Mark static numels as tl.constexpr

@pytorchbot merge

[inductor] Refactor MutableBox to make IRNode typing easier

@pytorchbot merge

[inductor] Refactor MutableBox to make IRNode typing easier

@pytorchbot merge

[inductor] Move fusion heuristics to V.choices

@pytorchbot rebase

[inductor] Move fusion heuristics to V.choices

@pytorchbot merge

CUDA error: CUDA_ERROR_ILLEGAL_ADDRESS cuLaunchKernel failed

Run with both `debug` target and `HL_DEBUG_CODEGEN=1`: ``` $ HL_DEBUG_CODEGEN=1 CUDA_LAUNCH_BLOCKING=1 python test/inductor/test_halide.py -k test_pow3_cuda Failed to load binary:python JIT compiling shared runtime for x86-64-linux-avx-avx2-avx512-avx512_cannonlake-avx512_skylake-cuda-f16c-fma-jit-sse41 JIT compiling cuda for x86-64-linux-avx-avx2-avx512-avx512_cannonlake-avx512_skylake-cuda-f16c-fma-jit-sse41...