tvm
tvm copied to clipboard
[BugFix] Fix Crash Cases Caused by "__tvm_meta__ = None"
Fix https://github.com/apache/tvm/issues/16633
The broken tests are below items of "tests/scripts/task_python_unittest.sh"
"tir-transform" "tir-usmp" The crashed cases: tests/python/tir-transform/test_tir_transform_inject_rolling_buffer.py tests/python/tir-usmp/test_tir_usmp_algo.py tests/python/tir-usmp/test_tir_usmp_analysis_extract_bufferinfo.py tests/python/tir-usmp/test_tir_usmp_transform_convert_pool_allocations_to_offsets.py tests/python/tir-usmp/test_tir_usmp_utils.py
other failed cases: tests/python/tir-transform/test_tir_transform_pointer_value_type_rewrite.py::TestRewriteToShuffle::test_compare tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4 tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
Run these test cases in local with this patch, below 3 test cases still will be failed because of TIR structure isn't equal, @Hzfengsy @tqchen can you help to see them? Thanks.
tir-transform: FAILED tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 FAILED tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4 FAILED tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
The reason of why the crashed test cases won't be caught by CI is the pytest exist code is 5, it means no tests is collected, in file "tests/scripts/setup-pytest-env.sh" we haven't treat the exist code of 5 as error.
I found another test item "tests/python/usmp" have the same exist code 5, because now there isn't the directory "tests/python/usmp", so pytest can't collect any test cases, I think "usmp" is a typo from https://github.com/apache/tvm/pull/16110, so I delete it.
cc @tqchen
I had this issue come up in https://github.com/apache/tvm/pull/16569. I think these tests probably were not being executed before. I am also pretty sure it's safe to just remove those lines, as __tvm_meta__ is not actually read anywhere.
In my PR, I added a check in the parser to make sure that decl_function would only be called with function definitions, which raises a readable error message and prevents a segfault.
Run these test cases in local with this patch, below 3 test cases still will be failed because of TIR structure isn't equal, @Hzfengsy @tqchen can you help to see them? Thanks.
tir-transform: FAILED tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 FAILED tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4 FAILED tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
I am also having this error in my PR :joy: For test_tir_transform_force_narrow_index_to_i32, I think it's a real bug (the thread binding in the loop is not being updated). Don't know what's going on with the other two.
I've figured out, thanks to @Lunderberg's comments in my PR, that the failure in tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 is due to line 354 in scr/script/ir_builder/tir/ir.cc. If you change
IterVar iter_var(Range(nullptr), Var("iter", dtype), IterVarType::kThreadIndex, thread);
to
IterVar iter_var(Range(nullptr), Var("iter", DataType::Int(32)), IterVarType::kThreadIndex, thread);
(what it was before the merge commit that Eric pointed out), the test will pass. However, I'm not sure we want to manually fix the data type like that.
This probably means that we should update the index narrowing pass to deal with the thread index itervar accordingly.
I did a git bisect for test_tir_transform_hoist_if.py: eb15d04c3bff76062e26d5647fb8af0323de1bed is the first bad commit for tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4. I will investigate to see if there's a fix hinted in there.
Bisect for test_transform_default_gpu_schedule.py::test_add_on_metal: First failing commit is 1c35c392648e4336fc5e00ab91abb37af997cd59. It looks to have made changes to test cases, but I'm not sure what exactly is causing the failure. Now that there are commits isolated, it should make it easier to debug.
Fix for the add_on_metal bug: You need to update IsScheduledOnGPU() in default_gpu_schedule.cc to also include Metal in its definition of GPU.
bool IsScheduledOnGPU(const BaseFunc& func) {
// the target from context.
tvm::Target target = tvm::Target::Current();
// the Target in kTarget attribute of PrimFunc
Optional<tvm::Target> func_target = func->attrs.GetAttr<tvm::Target>(tvm::attr::kTarget);
if (func_target.defined()) {
target = func_target.value();
}
if (target.defined()) {
int dev_type = target->GetTargetDeviceType();
// Change is here. I don't know if web GPU is quite correct
if (!(dev_type == kDLCUDA || dev_type == kDLMetal || dev_type == kDLROCM ||
dev_type == kDLWebGPU)) {
return false;
}
}
return true;
}
Note that this issue was mentioned in the code review for the PR. I will include this fix in my PR.
You need to update
IsScheduledOnGPU()indefault_gpu_schedule.ccto also include Metal in its definition of GPU.
Should we instead check whether the target tags include "gpu"? That would avoid needing to list out each target kind individually.
I'm not sure all of them actually contain the string "gpu." I will double check that.
Edit: Indeed, Metal is simply called "Metal"
These bugs should all be fixed now that #16569 was merged. I recommend you close unless there's a change in here that wasn't covered there.
https://github.com/apache/tvm/pull/16770 The sage continues...
#16770 The sage continues...
Hi Steven, I'm little confused. Does these three cases been fixed?
I tried main with commit: 240326, b2204ae6 and update status as below. It seems still two cases failed:
- still failed: tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
- passed: tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
- still failed: tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
#16770 The sage continues...
Hi Steven, I'm little confused. Does these three cases been fixed?
I tried main with commit: 240326, b2204ae and update status as below. It seems still two cases failed:
- still failed: tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
- passed: tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
- still failed: tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal
Test with tvm.main.20240407.81a850693, 3 tests all passed.