Steven S. Lyubomirsky

Results 113 comments of Steven S. Lyubomirsky

It seems the segfault is happening due to parsing [this line](https://github.com/apache/tvm/blob/2ca8f3131e07e78527da48eb768a224b6ce164eb/tests/python/tir-transform/test_tir_transform_inject_rolling_buffer.py#L232) and others like it. I'm not sure what my changes have to do with it at all (i.e., why...

No clue why `tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2` is failing. There is no well-formed error there, but I get a complaint about dtypes not matching (for the loop iterator `i0_i1_i2_i3_fused_2`). Not sure why it...

Note that `tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4` and `test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2` also fail on mainline, so we might have to fix real (unrelated) bugs there or disable the tests. The same is likely true of `tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal`.

https://github.com/apache/tvm/pull/16682 fixes the remaining issue in `test_tir_transform_hoist_if.py`.

Another failing test, presumably unrelated to my changes: `tests/python/tir-transform/test_tir_transform_inject_ptx_async_copy.py::test_vectorize_cp_async_in_if_then_else`. It complains that the var `data_im2col_reindex_shared_dyn` is undefined, which appears wrong, since it is defined with an `alloc_buffer`.

I've determined that the above failure was introduced by commit `ff0b99c5ce4371ec966cd4fa07ae36351faf2a5e`. In particular, reordering `MergeSharedMemoryAllocations` in [src/driver/driver_api.cc](https://github.com/apache/tvm/commit/ff0b99c5ce4371ec966cd4fa07ae36351faf2a5e#diff-6e7a38d1bddf565ab5096fbd3a85f39c4b6002e75fbc1cea73d783d3a803a086) triggers it--reverting those changes fixes it. However, I don't want to reverse the...

Thanks for the notes. I will note that `HoistIfThenElse` does not affect the failing test case, as I tested those changes separately. Perhaps we should discuss with @jinhongyii, the author...

Hm, the only remaining failure has to do with doc generation, as the doc generator finds multiple definitions of Target. One results from my previous changes (the `import *` in...

@tqchen A question about implementing one of the relatively simple cases, an in-place operation where the result is smaller than the input. I discussed with @MasterJH5574 and we weren't entirely...

Alternative approach suggested by @tqchen and @psrivas2: Consider dataflow blocks only. This would have the advantage of avoiding a whole-program analysis for liveness and aliases and would be a large...