tvm [Unity][Parser] Check well-formedness in the parser

As discussed in several TVM Open Development meetings, we were permitting invalid programs to be parsed by not checking well-formedness during parsing. This PR makes a small change to the parser to check well-formedness in both Relax and TIR proactively, though this required correcting bugs in tests. Those bugs should not have been there in the first place!

I had held off on making this PR until this PR on MLC-LLM, which ensured that there would not be invalid constructs in models constructed by MLC-LLM.

One issue is that the well-formed checker does not create the most readable error messages. If the normalizer's error reporting could also be used in the well-formed checker, that would be a useful change to make as well.

Feb 14 '24 05:02 slyubomirsky

@Lunderberg thank you for the review. I've pushed a change to enable a well-formed check for individual Relax functions (this exposed some bugs in our tests!) and PrimFuncs, which can also be disabled with the check_well_formed argument, but it led to some TIR well-formedness failures. I am not sure what could be wrong with most of them so I'd appreciate a look if you have time once the CI finishes. Oddly, setting check_well_formed to false on those did not correct the errors, so I'm not sure what the issue was.

Feb 14 '24 22:02 slyubomirsky

I've determined that the segfault in the tests comes from tests/python/tir-transform/test_transform_inject_rolling_buffer.py. According to the backtrace, this comes from checking the StructInfo for a global variable, but oddly, it does not appear to come from checking well-formedness. Any idea what could be the issue? @Lunderberg

Feb 15 '24 21:02 slyubomirsky

It seems the segfault is happening due to parsing this line and others like it. I'm not sure what my changes have to do with it at all (i.e., why it hasn't happened before). If I remove the line, I get a failure due to tensor_2 being undefined, so I would appreciate advice for how to fix this test case. (Edit: I just added tensor_2 as an argument. Unclear why the line was there in the first place.)

Feb 23 '24 21:02 slyubomirsky

No clue why tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 is failing. There is no well-formed error there, but I get a complaint about dtypes not matching (for the loop iterator i0_i1_i2_i3_fused_2). Not sure why it wouldn't have failed before.

Same with tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4

Feb 27 '24 02:02 slyubomirsky

Note that tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4 and test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 also fail on mainline, so we might have to fix real (unrelated) bugs there or disable the tests. The same is likely true of tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal.

Feb 27 '24 20:02 slyubomirsky

I did a bisect on test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2, and it's been broken for quite some time. This merge commit from main into unity broke it, and it has been broken ever since. There's some itervar-related changes in src/script/ir_builder/tir/ir.cc that I suspect to be the cause. They remove an unconditional usage of int32 dtype, which could be previously been assumed.

Feb 27 '24 22:02 Lunderberg

@tvm-bot rerun

Mar 04 '24 02:03 yongwww

https://github.com/apache/tvm/pull/16682 fixes the remaining issue in test_tir_transform_hoist_if.py.

Mar 06 '24 22:03 slyubomirsky

Another failing test, presumably unrelated to my changes: tests/python/tir-transform/test_tir_transform_inject_ptx_async_copy.py::test_vectorize_cp_async_in_if_then_else. It complains that the var data_im2col_reindex_shared_dyn is undefined, which appears wrong, since it is defined with an alloc_buffer.

Mar 06 '24 23:03 slyubomirsky

I've determined that the above failure was introduced by commit ff0b99c5ce4371ec966cd4fa07ae36351faf2a5e. In particular, reordering MergeSharedMemoryAllocations in src/driver/driver_api.cc triggers it--reverting those changes fixes it. However, I don't want to reverse the reordering because there is a comment on it that implies it was done for a reason, so I will see if there is a bug in that pass.

Mar 12 '24 23:03 slyubomirsky

Taking a look at that commit, there's a couple of things that stand out to me.

Related to MergeSharedMemoryAllocations
- The MergeSharedMemoryAllocations is a copy/paste of StorageRewrite with modifications. The StorageRewrite pass is applied earlier and already handled merging of shared memory allocations, so I'm not sure why the separate pass is needed.
  - It looks like there's some different descriptions of StorageRewrite. This 2021 comment states that StorageRewrite cannot merge allocations of different dtypes. However, this 2018 PR added memory-sharing of allocations of different dtypes for StorageRewrite. This may be related to static vs dynamic shapes.
- The StorageRewrite pass only merged allocations that were within the same attr::thread_extent annotation, and would be part of the same kernel. Assuming that MergeSharedMemoryAllocations kept this behavior from StorageRewrite, moving it after SplitHostDevice shouldn't be necessary.
- I'd have to dig into it to be sure, but using MergeSharedMemoryAllocations after ThreadSync("warp") seems like it could cause an issue. Applying MergeSharedMemoryAllocations (merge two buffers into one) feels like it would undo the benefit of ThreadSync("warp") (synchronize to avoid contention of a single buffer).
Related to HoistIfThenElse
- The comment about requiring HoistIfThenElse to be before UnrollLoop doesn't make sense to me. Any condition that could be statically proven for all indices prior to UnrollLoop could also be statically proven for a specific loop index. The end result of UnrollLoop and HoistIfThenElse should be the same regardless of the ordering.
  - Slight caveat: Unless the HoistIfThenElse causes a different decision to be made by the automatic unroll heuristics in UnrollLoop. However, those are disabled by default, and are not enabled by the unit test.
- Moving HoistIfThenElse changed it's ordering relative to user-specified passes in the "tir.add_lower_pass" configuration. While I don't know of any usage of this functionality, it's a bit worrying that it occurred without discussion.
- The new function attribute "tir.HoistIfThenElseExprWithBlock" overrides the user's configuration for "tir.HoistIfThenElse". If the user had selected a stronger hoisting to be performed (e.g. hoisting tir::IfThenElse statements), it would no longer be applied.
- The configuration used when the new function attribute "tir.HoistIfThenElseExprWithBlock" is set doesn't make sense for this pass's location in the lowering flow. The HoistIfThenElse pass occurs after ConvertBlocksToOpaque has removed all block variables, so the HoistedConditionals::kUsingBlockVar flag shouldn't be required.
Related to unit tests
- The unit tests added in test_gpu_low_batch_gemv.py only exercise the meta-schedule functionality, and do not exercise any of the changes made to driver_api.cc or to HoistIfThenElse.

Mar 13 '24 12:03 Lunderberg

Thanks for the notes. I will note that HoistIfThenElse does not affect the failing test case, as I tested those changes separately. Perhaps we should discuss with @jinhongyii, the author of that PR?

Mar 13 '24 21:03 slyubomirsky

Hm, the only remaining failure has to do with doc generation, as the doc generator finds multiple definitions of Target. One results from my previous changes (the import * in python/tvm/tir/transform/__init__.py will re-export the import of target), but I am not sure what is causing the others. I will see if this is entirely due to the re-import.

Mar 21 '24 00:03 slyubomirsky

Unfortunately we find that the pr caused an outage of the MLC compilation, seems to relates to MergeSharedMemoryAllocations location change.

We should also carve out a UT from failed TIR which could potentially helps future regressions.

opened a #16769 as a temp measure to get things back for now. We can redo the PR ideally not changing the MergeSharedMemoryAllocations, or confirm that the carved out tir cases can pass

cc @slyubomirsky @jinhongyii @yongwww

Mar 22 '24 16:03 tqchen

tvm tvm copied to clipboard

[Unity][Parser] Check well-formedness in the parser

tvm
tvm copied to clipboard