GH-43130: [C++][ArrowFlight] Crash due to UCS thread mode
When mode is UCS_THREAD_MODE_SERIALIZED, UCX crash due to mpool corruption.
This happens when buffer is deallocated on a different thread. In such case two threads access UCX memory pool simultaneously.
See discussion on UCX forum: https://github.com/openucx/ucx/discussions/9987
- GitHub Issue: #43130
Thanks! However, this code is scheduled for deprecation/removal soon in favor of the Disassociated IPC proposal
This is not a MINOR change. See https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes for our MINOR definition. If we need this change, could you open a new issue for this?
Thanks! However, this code is scheduled for deprecation/removal soon in favor of the Disassociated IPC proposal
@lidavidm - Could you please refer me to the "Disassociated IPC" proposal?
Does it mean UCX support is planned to be removed? What is the timeline for this?
If we need this change, could you open a new issue for this?
@kou - Sure, opened https://github.com/apache/arrow/issues/43130
https://arrow.apache.org/docs/dev/format/DissociatedIPC.html
@zeroshade what was the timeline?
https://arrow.apache.org/docs/dev/format/DissociatedIPC.html
Looks like the reference implementation for Disassociated IPC might suffer from the same issue:
https://github.com/apache/arrow-experiments/blob/188c4e5ff4bda08319d4520e380d736c36b9ee48/dissociated-ipc/ucx_client.cc#L55-L60
arrow::Result<std::unique_ptr<utils::Connection>> UcxClient::CreateConn() {
ucp_worker_params_t worker_params;
std::memset(&worker_params, 0, sizeof(worker_params));
worker_params.field_mask =
UCP_WORKER_PARAM_FIELD_THREAD_MODE | UCP_WORKER_PARAM_FIELD_FLAGS;
worker_params.thread_mode = UCS_THREAD_MODE_SERIALIZED;
It was partially based on this experiment so I'm not surprised.
I think for now I'm OK merging this, but please be aware it will go away in the near future.
I think for now I'm OK merging this, but please be aware it will go away in the near future.
Sure, thanks. Consider fixing it also on arrow-experiments.
As per the deprecation, see this email: https://lists.apache.org/thread/g89x2y6pvlq6gyf0d1jnxfl2onsrkyt8 I plan to work on a PR removing support soon.
:warning: GitHub issue #43130 has been automatically assigned in GitHub to PR creator.
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5a28e180f0a078d0b1a970310d5c03ea6a351ffe.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 28 possible false positives for unstable benchmarks that are known to sometimes produce them.