tvm
tvm copied to clipboard
[Unity][Parser] Check well-formedness in the parser
As discussed in several TVM Open Development meetings, we were permitting invalid programs to be parsed by not checking well-formedness during parsing. This PR makes a small change to the parser to check well-formedness in both Relax and TIR proactively, though this required correcting bugs in tests. Those bugs should not have been there in the first place!
I had held off on making this PR until this PR on MLC-LLM, which ensured that there would not be invalid constructs in models constructed by MLC-LLM.
One issue is that the well-formed checker does not create the most readable error messages. If the normalizer's error reporting could also be used in the well-formed checker, that would be a useful change to make as well.
@Lunderberg thank you for the review. I've pushed a change to enable a well-formed check for individual Relax functions (this exposed some bugs in our tests!) and PrimFuncs, which can also be disabled with the check_well_formed
argument, but it led to some TIR well-formedness failures. I am not sure what could be wrong with most of them so I'd appreciate a look if you have time once the CI finishes. Oddly, setting check_well_formed
to false on those did not correct the errors, so I'm not sure what the issue was.
I've determined that the segfault in the tests comes from tests/python/tir-transform/test_transform_inject_rolling_buffer.py
. According to the backtrace, this comes from checking the StructInfo for a global variable, but oddly, it does not appear to come from checking well-formedness. Any idea what could be the issue? @Lunderberg
It seems the segfault is happening due to parsing this line and others like it. I'm not sure what my changes have to do with it at all (i.e., why it hasn't happened before). If I remove the line, I get a failure due to tensor_2
being undefined, so I would appreciate advice for how to fix this test case. (Edit: I just added tensor_2
as an argument. Unclear why the line was there in the first place.)
No clue why tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
is failing. There is no well-formed error there, but I get a complaint about dtypes not matching (for the loop iterator i0_i1_i2_i3_fused_2
). Not sure why it wouldn't have failed before.
Same with tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
Note that tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
and test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
also fail on mainline, so we might have to fix real (unrelated) bugs there or disable the tests. The same is likely true of tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
.
I did a bisect on test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
, and it's been broken for quite some time. This merge commit from main
into unity
broke it, and it has been broken ever since. There's some itervar-related changes in src/script/ir_builder/tir/ir.cc
that I suspect to be the cause. They remove an unconditional usage of int32
dtype, which could be previously been assumed.
@tvm-bot rerun
https://github.com/apache/tvm/pull/16682 fixes the remaining issue in test_tir_transform_hoist_if.py
.
Another failing test, presumably unrelated to my changes: tests/python/tir-transform/test_tir_transform_inject_ptx_async_copy.py::test_vectorize_cp_async_in_if_then_else
. It complains that the var data_im2col_reindex_shared_dyn
is undefined, which appears wrong, since it is defined with an alloc_buffer
.
I've determined that the above failure was introduced by commit ff0b99c5ce4371ec966cd4fa07ae36351faf2a5e
. In particular, reordering MergeSharedMemoryAllocations
in src/driver/driver_api.cc triggers it--reverting those changes fixes it. However, I don't want to reverse the reordering because there is a comment on it that implies it was done for a reason, so I will see if there is a bug in that pass.
Taking a look at that commit, there's a couple of things that stand out to me.
-
Related to
MergeSharedMemoryAllocations
- The
MergeSharedMemoryAllocations
is a copy/paste ofStorageRewrite
with modifications. TheStorageRewrite
pass is applied earlier and already handled merging of shared memory allocations, so I'm not sure why the separate pass is needed.- It looks like there's some different descriptions of
StorageRewrite
. This 2021 comment states thatStorageRewrite
cannot merge allocations of different dtypes. However, this 2018 PR added memory-sharing of allocations of different dtypes forStorageRewrite
. This may be related to static vs dynamic shapes.
- It looks like there's some different descriptions of
- The
StorageRewrite
pass only merged allocations that were within the sameattr::thread_extent
annotation, and would be part of the same kernel. Assuming thatMergeSharedMemoryAllocations
kept this behavior fromStorageRewrite
, moving it afterSplitHostDevice
shouldn't be necessary. - I'd have to dig into it to be sure, but using
MergeSharedMemoryAllocations
afterThreadSync("warp")
seems like it could cause an issue. ApplyingMergeSharedMemoryAllocations
(merge two buffers into one) feels like it would undo the benefit ofThreadSync("warp")
(synchronize to avoid contention of a single buffer).
- The
-
Related to
HoistIfThenElse
- The comment about requiring
HoistIfThenElse
to be beforeUnrollLoop
doesn't make sense to me. Any condition that could be statically proven for all indices prior toUnrollLoop
could also be statically proven for a specific loop index. The end result ofUnrollLoop
andHoistIfThenElse
should be the same regardless of the ordering.- Slight caveat: Unless the
HoistIfThenElse
causes a different decision to be made by the automatic unroll heuristics inUnrollLoop
. However, those are disabled by default, and are not enabled by the unit test.
- Slight caveat: Unless the
- Moving
HoistIfThenElse
changed it's ordering relative to user-specified passes in the"tir.add_lower_pass"
configuration. While I don't know of any usage of this functionality, it's a bit worrying that it occurred without discussion. - The new function attribute
"tir.HoistIfThenElseExprWithBlock"
overrides the user's configuration for"tir.HoistIfThenElse"
. If the user had selected a stronger hoisting to be performed (e.g. hoistingtir::IfThenElse
statements), it would no longer be applied. - The configuration used when the new function attribute
"tir.HoistIfThenElseExprWithBlock"
is set doesn't make sense for this pass's location in the lowering flow. TheHoistIfThenElse
pass occurs afterConvertBlocksToOpaque
has removed all block variables, so theHoistedConditionals::kUsingBlockVar
flag shouldn't be required.
- The comment about requiring
-
Related to unit tests
- The unit tests added in
test_gpu_low_batch_gemv.py
only exercise the meta-schedule functionality, and do not exercise any of the changes made todriver_api.cc
or toHoistIfThenElse
.
- The unit tests added in
Thanks for the notes. I will note that HoistIfThenElse
does not affect the failing test case, as I tested those changes separately. Perhaps we should discuss with @jinhongyii, the author of that PR?
Hm, the only remaining failure has to do with doc generation, as the doc generator finds multiple definitions of Target. One results from my previous changes (the import *
in python/tvm/tir/transform/__init__.py
will re-export the import of target), but I am not sure what is causing the others. I will see if this is entirely due to the re-import.
Unfortunately we find that the pr caused an outage of the MLC compilation, seems to relates to MergeSharedMemoryAllocations
location change.
We should also carve out a UT from failed TIR which could potentially helps future regressions.
opened a #16769 as a temp measure to get things back for now. We can redo the PR ideally not changing the MergeSharedMemoryAllocations
, or confirm that the carved out tir cases can pass
cc @slyubomirsky @jinhongyii @yongwww