Can't print XLA tensors or call `cpu()`.
🐛 Bug
Recently I've been seeing this error whenever I try to run the following with PJRT_DEVICE=CPU (it works fine if I use CUDA).
>>> x = torch.rand(5, device="xla")
>>> x.cpu()
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1745001283.205982 796 concurrent_vector.h:70] Check failed: index < state.size (65534 vs. 10)
*** Check failure stack trace: ***
@ 0x76d22da0e78d absl::lts_20230802::log_internal::LogMessage::PrepareToDie()
@ 0x76d22da0e7fd absl::lts_20230802::log_internal::LogMessage::SendToLog()
@ 0x76d22da0e280 absl::lts_20230802::log_internal::LogMessage::Flush()
@ 0x76d22da0eacc absl::lts_20230802::log_internal::LogMessageFatal::~LogMessageFatal()
@ 0x76d218baadf4 tsl::internal::ConcurrentVector<>::operator[]()
@ 0x76d218baa1c8 tsl::AsyncValue::GetTypeInfo()
@ 0x76d218baa703 tsl::AsyncValue::Destroy()
@ 0x76d218baa3e7 tsl::AsyncValue::DropRef()
@ 0x76d218baa085 tsl::AsyncValue::DropRef()
@ 0x76d218baaeeb tsl::RCReference<>::~RCReference()
@ 0x76d2194eb69a tsl::AsyncValueRef<>::~AsyncValueRef()
@ 0x76d21c3e3f81 xla::cpu::ThunkExecutor::ExecuteSequential()
@ 0x76d21c3e3737 xla::cpu::ThunkExecutor::Execute()
@ 0x76d2194da10b xla::TfrtCpuExecutable::ExecuteHelper()
@ 0x76d2194dd607 xla::TfrtCpuExecutable::ExecuteSharded()
@ 0x76d2194a5751 xla::PjRtLoadedExecutable::ExecuteSharded()
@ 0x76d21949e58b torch_xla::runtime::PjRtComputationClient::ExecuteComputation()
@ 0x76d218eeeda6 torch_xla::XLAGraphExecutor::ScheduleSyncTensorsGraph()::{lambda()#1}::operator()()
@ 0x76d218ef6ab2 std::__invoke_impl<>()
@ 0x76d218ef6321 std::__invoke_r<>()
@ 0x76d218ef5b8d std::_Function_handler<>::_M_invoke()
@ 0x76d3e7711c1c std::function<>::operator()()
@ 0x76d3d5c8a0c9 torch::lazy::MultiWait::Complete()
@ 0x76d3d5c89e26 torch::lazy::MultiWait::Completer()::{lambda()#1}::operator()()
@ 0x76d3d5c8a85e std::__invoke_impl<>()
@ 0x76d3d5c8a615 std::__invoke_r<>()
@ 0x76d3d5c8a41b std::_Function_handler<>::_M_invoke()
@ 0x76d218a3c70e std::function<>::operator()()
@ 0x76d22d71c0ca tsl::thread::EigenEnvironment::ExecuteTask()
@ 0x76d22d71cef2 Eigen::ThreadPoolTempl<>::WorkerLoop()
@ 0x76d22d71c378 Eigen::ThreadPoolTempl<>::ThreadPoolTempl()::{lambda()#1}::operator()()
@ 0x76d22d71f06a std::__invoke_impl<>()
@ 0x76d22d71ea02 std::__invoke_r<>()
@ 0x76d22d71da7d std::_Function_handler<>::_M_invoke()
@ 0x76d218a3c70e std::function<>::operator()()
@ 0x76d22d71be7b tsl::thread::EigenEnvironment::CreateThread()::{lambda()#1}::operator()()
@ 0x76d22d71f42b std::__invoke_impl<>()
@ 0x76d22d71f408 std::__invoke<>()
@ 0x76d22d71f3e5 std::invoke<>()
@ 0x76d22d71f3a6 absl::lts_20230802::internal_any_invocable::InvokeR<>()
@ 0x76d22d71f1ad absl::lts_20230802::internal_any_invocable::RemoteInvoker<>()
@ 0x76d2194ef8ed absl::lts_20230802::internal_any_invocable::Impl<>::operator()()
@ 0x76d22d6fa5ae tsl::(anonymous namespace)::PThread::ThreadFn()
@ 0x76d3e9cfcea7 start_thread
Aborted (core dumped)
Environment
- Reproducible on XLA backend [CPU/TPU/CUDA]: CPU
- torch_xla version: 0bb4f6f01931fba78b18505d0414a85ae51b8171
@tengyifei @lsy323 @qihqi @bhavya01 Any ideas what might be happening, here?
I am also seeing this now:
#8 0x00007ffcf27eca28 in tsl::AsyncValue::GetTypeInfo (this=0x55555bf9e9c0) at external/xla/xla/tsl/concurrency/async_value.h:475
(gdb) p *this
$1 = {static kUnknownTypeId = 0, refcount_ = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 1}, static is_always_lock_free = true}, kind_ = tsl::AsyncValue::Kind::kConcrete, has_vtable_ = false, is_refcounted_ = false, type_id_ = 65535, waiters_and_state_ = {static _S_min_alignment = 8, static _S_alignment = 8, _M_i = {static kStateMask = 3, static kPointerMask = 18446744073709551612, value = 2}, static is_always_lock_free = <optimized out>}, static kDataOffset = 64, static total_allocated_async_values_ = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 12}, static is_always_lock_free = true}}
Maybe as one of the async values being dereferenced at https://github.com/openxla/xla/blob/86b2f51f8000326813fd9742aaac6bd1868cc19b/xla/pjrt/cpu/cpu_client.cc#L1446.
Seems a relatively serious issue since it can be any tensor print, as it hinders CPU development.
Hey @ysiraichi, do we have a path forward on this one? It would be great to be able to use CPU locally on the container.
Not really. I could not reproduce it on CI (#9048). Didn't have much time to investigate it myself.
It seems the problem occurs when DEBUG=1.