oneflow
oneflow copied to clipboard
RuntimeError: Check failed: (nd_sbp.has_value()) == (this->has_nd_sbp_symbol_id()) (0 vs 1)
trafficstars
Summary
RuntimeError: Check failed: (nd_sbp.has_value()) == (this->has_nd_sbp_symbol_id()) (0 vs 1)
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/functional/impl/global_cast.cpp", line 526, in operator()
MetaInfoConsistencyCheck(parallel_desc, sbp_parallels, grad_sbp_parallels, 1, check_meta)
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/framework/consistency_check.cpp", line 253, in MetaInfoConsistencyCheck
MetaInfoConsistencyCheck(placement, nd_sbp, grad_nd_sbp, debug_level, force_check)
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/framework/consistency_check.cpp", line 231, in MetaInfoConsistencyCheck
MetaInfoConsistencyCheckUtil(placement, nd_sbp, grad_nd_sbp)
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/framework/consistency_check.cpp", line 201, in MetaInfoConsistencyCheckUtil
ctx->Check()
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/framework/consistency_check.cpp", line 147, in Check
flat_meta_info_consistency_->Check(placement_, nd_sbp_, grad_nd_sbp_)
File "/home/xuxiaoyu/dev/oneflow/oneflow/core/framework/consistency_check.cpp", line 86, in Check
Error Type: oneflow.ErrorProto.check_failed_error
System Information
- OneFlow version (run
python3 -m oneflow --doctor): 0.8
When running with a global tensor, some rank has env variable ONEFLOW_DEBUG_MODE=1, and some rank has ONEFLOW_DEBUG_MODE=0, this check error will be raised.
Just make all rank's ONEFLOW_DEBUG_MODE has the save value will fix this check error.