root
root copied to clipboard
Broken streaming of vector of enum with underlying type other than int
Check duplicate issues.
- [x] Checked for duplicates
Description
I need help to understand an issue which we have when running on Linux on ARM when reading a file which was serialised on x86. Notice that this platform is peculiar, because char (without specifier) is unsigned, and not signed (char sign-ess is implementation detail in the standard).
This is important because mPadSubset that you will see below is an enum PadSubset : char. Running in valgrind, the issue appears as dumped below.
What puzzles me and what I think is the culprit of the segmentation fault is the line:
[1965517:tpc-tracker]: i= 2, mPadSubset type= 23, offset= 56, len=2, method=0 [optimized]
as I would have expected it to be len=1. Can you explain me what is going on?
[1965517:tpc-tracker]: ====>Rebuilding TStreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version: 1
[1965517:tpc-tracker]: Creating StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version: 2
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version=2, checksum=0x93700773
[1965517:tpc-tracker]: string mName offset= 0 type=300 ,stl=365, ctype=365, name of the object
[1965517:tpc-tracker]: vector<o2::tpc::CalArray<o2::tpc::PadFlags> > mData offset= 32 type=300 ,stl=1, ctype=61, internal CalArrays
[1965517:tpc-tracker]: o2::tpc::PadSubset mPadSubset offset= 56 type= 3 Pad subset granularity
[1965517:tpc-tracker]: i= 0, mName type=300, offset= 0, len=1, method=0
[1965517:tpc-tracker]: i= 1, mData type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]: i= 2, mPadSubset type= 3, offset= 56, len=1, method=0
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version=1, checksum=0x93700773
[1965517:tpc-tracker]: string mName offset= 0 type=300 ,stl=365, ctype=365, name of the object
[1965517:tpc-tracker]: vector<o2::tpc::CalArray<o2::tpc::PadFlags> > mData offset= 32 type=300 ,stl=1, ctype=61, internal CalArrays
[1965517:tpc-tracker]: o2::tpc::PadSubset mPadSubset offset= 56 type= 3 Pad subset granularity
[1965517:tpc-tracker]: i= 0, mName type=300, offset= 0, len=1, method=0
[1965517:tpc-tracker]: i= 1, mData type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]: i= 2, mPadSubset type= 3, offset= 56, len=1, method=0
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: ====>Rebuilding TStreamerInfo for class: o2::tpc::CalArray<o2::tpc::PadFlags>, version: 1
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalArray<o2::tpc::PadFlags>, version=1, checksum=0xb03d18c2
[1965517:tpc-tracker]: string mName offset= 0 type=300 ,stl=365, ctype=365,
[1965517:tpc-tracker]: vector<o2::tpc::PadFlags> mData offset= 32 type=300 ,stl=1, ctype=3, calibration data
[1965517:tpc-tracker]: o2::tpc::PadSubset mPadSubset offset= 56 type= 3 Subset type
[1965517:tpc-tracker]: int mPadSubsetNumber offset= 60 type= 3 Number of the pad subset, e.g. ROC 0 is IROC A00
[1965517:tpc-tracker]: i= 0, mName type=300, offset= 0, len=1, method=0
[1965517:tpc-tracker]: i= 1, mData type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]: i= 2, mPadSubset type= 23, offset= 56, len=2, method=0 [optimized]
[1965517:tpc-tracker]: ==1965517== Invalid write of size 1
[1965517:tpc-tracker]: ==1965517== at 0xF36E7A0: frombuf (Bytes.h:313)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7A0: frombuf (Bytes.h:442)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7A0: ReadFastArray (TBufferFile.cxx:1338)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7A0: TBufferFile::ReadFastArray(int*, int) (TBufferFile.cxx:1327)
[1965517:tpc-tracker]: ==1965517== by 0xF3E580B: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1183)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517== by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517== by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517== by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517== by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517== by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517== by 0xF3B82E3: TDirectoryFile::GetObjectChecked(char const*, TClass const*) (TDirectoryFile.cxx:1111)
[1965517:tpc-tracker]: ==1965517== Address 0x153fbb80 is 0 bytes after a block of size 1,440 alloc'd
[1965517:tpc-tracker]: ==1965517== at 0x4868908: operator new(unsigned long) (vg_replace_malloc.c:483)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (new_allocator.h:137)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (allocator.h:188)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (alloc_traits.h:464)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: _M_allocate (stl_vector.h:378)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: _M_allocate (stl_vector.h:375)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: std::vector<o2::tpc::PadFlags, std::allocator<o2::tpc::PadFlags> >::_M_default_append(unsigned long) (vector.tcc:650)
[1965517:tpc-tracker]: ==1965517== by 0xF3E5797: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1176)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517== by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517== by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517== by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517== by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517== by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517==
[1965517:tpc-tracker]: ==1965517== Invalid write of size 1
[1965517:tpc-tracker]: ==1965517== at 0xF36E7AC: frombuf (Bytes.h:314)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7AC: frombuf (Bytes.h:442)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7AC: ReadFastArray (TBufferFile.cxx:1338)
[1965517:tpc-tracker]: ==1965517== by 0xF36E7AC: TBufferFile::ReadFastArray(int*, int) (TBufferFile.cxx:1327)
[1965517:tpc-tracker]: ==1965517== by 0xF3E580B: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1183)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517== by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517== by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517== by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517== by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517== by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517== by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517== by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517== by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517== by 0xF3B82E3: TDirectoryFile::GetObjectChecked(char const*, TClass const*) (TDirectoryFile.cxx:1111)
[1965517:tpc-tracker]: ==1965517== Address 0x153fbb81 is 1 bytes after a block of size 1,440 alloc'd
[1965517:tpc-tracker]: ==1965517== at 0x4868908: operator new(unsigned long) (vg_replace_malloc.c:483)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (new_allocator.h:137)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (allocator.h:188)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: allocate (alloc_traits.h:464)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: _M_allocate (stl_vector.h:378)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: _M_allocate (stl_vector.h:375)
[1965517:tpc-tracker]: ==1965517== by 0x60E5D1F: std::vector<o2::tpc::PadFlags, std::allocator<o2::tpc::PadFlags> >::_M_default_append(unsigned long) (vector.tcc:650)
[1965517:tpc-tracker]: ==1965517== by 0xF3E5797: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1176)
[1965517:tpc-tracker]: ==1965517== by 0xF36EC7B: Streamer (TClass.h:614)
Reproducer
I do not have one which does not involve running ALICE reconstruction on ARM.
ROOT version
6.32.02.
Installation method
aliBuild
Operating system
ALMA Linux 9 on ARM64 (Ampere Altra)
Additional context
No response
Can you give us a bit more information? What would be useful, if possible:
- The stacktrace from the segfault
- A description on how to set up the corresponding ALICE environment so that we can look at the dictionaries and headers
- The ROOT file that caused the crash
Is it confirmed that the same data serialized on ARM does not cause a crash?
For the file:
https://cernbox.cern.ch/s/MXkLwJLm61rckhj
I cannot confirm if the same data serialised on ARM does not cause a crash.
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: handle_crash(int)
[1064949:tpc-tracker]: linux-vdso.so.1: ?? ??:0
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadFastArray(int*, int)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TKey::ReadObjectAny(TClass const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TDirectoryFile::GetObjectChecked(char const*, TClass const*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataRefUtils::decodeCCDB(o2::framework::DataRef const&, std::type_info const&)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: decltype(auto) o2::framework::InputRecord::get<o2::tpc::CalDet<o2::tpc::PadFlags>*, char const*>(char const*, int) const
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: bool o2::gpu::GPURecoWorkflowSpec::fetchCalibsCCDBTPC<o2::gpu::GPUCalibObjectsTemplate<o2::gpu::ConstPtr> >(o2::framework::ProcessingContext&, o2::gpu::GPUCalibObjectsTemplate<o2::gpu::ConstPtr>&, o2::gpu::GPURecoWorkflowSpec::calibObjectStruct&)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: o2::gpu::GPURecoWorkflowSpec::doCalibUpdates(o2::framework::ProcessingContext&, o2::gpu::GPURecoWorkflowSpec::calibObjectStruct&)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: o2::gpu::GPURecoWorkflowSpec::run(o2::framework::ProcessingContext&)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: ?? ??:0
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::tryDispatchComputation(o2::framework::ServiceRegistryRef, std::vector<o2::framework::DataRelayer::RecordAction, std::allocator<o2::framework::DataRelayer::RecordAction> >&)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::doRun(o2::framework::ServiceRegistryRef)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::run_callback(uv_work_s*)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::Run()
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: fair::mq::Device::RunWrapper()
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: boost::detail::function::void_function_obj_invoker1<std::function<void (fair::mq::State)>, void, fair::mq::State>::invoke(boost::detail::function::function_buffer&, fair::mq::State)
[1064949:tpc-tracker]: /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: boost::signals2::detail::signal_impl<void (fair::mq::State), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (fair::mq::State)>, boost::function<void (boost::signals2::connection const&, fair::mq::State)>, boost::signals2::mutex>::operator()(fair::mq::State)
is one of the stacktraces. It actually dies in different ways, most likely there is some memory corruption going on...
For the ALICE environment, the easiest is probably sitting together. It's on a custom machine in my private area.
Thanks. I'm not at CERN today but getting started with the information.
(Side note: MakeProject does not reconstruct the enums with the correct underlying type)
Another stacktrace which seems to be related to this is:
[1500611:internal-dpl-ccdb-backend]: Executable is /root/src/sw/slc9_aarch64/O2/dev-local1/bin/o2-tpc-reco-workflow
[1500611:internal-dpl-ccdb-backend]: linux-vdso.so.1: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: [0xfff3cae9b014]: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: [0xfff3cae9d7f0]: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ?? ??:0
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TCling::AutoParseImplRecurse(char const*, bool)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TCling::AutoParse(char const*)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TClingLookupHelper__AutoParse(char const*)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ROOT::TMetaUtils::TClingLookupHelper::GetPartiallyDesugaredNameWithScopeHandling(std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCore.so.6.32: TClassEdit::GetNormalizedName(std::__cxx11::basic_string<char, std::char_traits<char>, std:
:allocator<char> >&, std::basic_string_view<char, std::char_traits<char> >)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCore.so.6.32: TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfo::BuildCheck(TFile*, bool)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TFile::ReadStreamerInfo()
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TFile::Init(bool)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TMemFile::TMemFile(char const*, char*, long long, char const*, char const*, int, long long)
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char
> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basi
c_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_s
tring<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*) const
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::getFromSnapshot(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<char, boost::con
tainer::pmr::polymorphic_allocator<char> >&, int&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::navigateSourcesAndLoadFile(o2::ccdb::CcdbApi::RequestContext&, int&, unsigned long*)
const
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::vectoredLoadFileToMemory(std::vector<o2::ccdb::CcdbApi::RequestContext, std::allocat
or<o2::ccdb::CcdbApi::RequestContext> >&) const
[1500611:internal-dpl-ccdb-backend]: /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char
> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::bas$
Interestingly enough, the actual array returned by backtrace can be decoded by GDB to:
$4 = {0xffffac196fb0 <handle_crash(int)+48>, 0xffffb2f727f0 <__kernel_rt_sigreturn>, 0xfff3ea6f5014, 0xfff3ea6f77f0,
0xffff9e97b198 <(anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&)+2392>,
0xffff9d4b0de0 <cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&)+272>, 0xffff9d435f78 <cling::Interpreter::executeTransaction(cling::Transaction&)+40>,
0xffff9d4c0e30 <cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool)+768>,
0xffff9d4c398c <cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&)+108>,
0xffff9d433d80 <cling::Interpreter::parseForModule(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+176>, 0xffff9d36b5f8
<ExecAutoParse(char const*, Bool_t, cling::Interpreter*)+568>, 0xffff9d36cf48 <TCling::AutoParseImplRecurse(char const*, bool)+1400>, 0xffff9d374de4 <TCling::AutoParse(char const*)+340>,
0xffff9d355204 <TClingLookupHelper__AutoParse(char const*)+36>, 0xffff9d2c8b44
<ROOT::TMetaUtils::TClingLookupHelper::GetPartiallyDesugaredNameWithScopeHandling(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool)+116>, 0xffffa7acf42c
<TClassEdit::GetNormalizedName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::basic_string_view<char, std::char_traits<char> >)+540>, 0xffffa7aeab58
<TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long)+1144>, 0xffffa7f852b4 <TStreamerInfo::BuildCheck(TFile*, bool)+148>, 0xffffa7f4751c <TFile::ReadStreamerInfo()+700>,
0xffffa7f4fc40 <TFile::Init(bool)+1056>, 0xffffa7f74a60 <TMemFile::TMemFile(char const*, char*, long long, char const*, char const*, int, long long)+268>, 0xffffac4515b4
<o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*) const+900>,
0xffffac451f68 <o2::ccdb::CcdbApi::getFromSnapshot(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<char, boost::container::pmr::polymorphic_allocator<char> >&, int&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+936>,
0xffffac452100 <o2::ccdb::CcdbApi::navigateSourcesAndLoadFile(o2::ccdb::CcdbApi::RequestContext&, int&, unsigned long*) const+192>,
0xffffac4524d0 <o2::ccdb::CcdbApi::vectoredLoadFileToMemory(std::vector<o2::ccdb::CcdbApi::RequestContext, std::allocator<o2::ccdb::CcdbApi::RequestContext> >&) const+240>,
Some more points gathered during a debug session:
- The problem appears only on ARM/Linux, not on ARM/Mac
- The streamer info output
[1965517:tpc-tracker]: i= 2, mPadSubset type= 23, offset= 56, len=2, method=0 [optimized]
does not seem to indicate a problem because the same list of streamer elements also contains the expected
o2::tpc::PadSubset mPadSubset offset= 56 type= 3 Subset type
- If the class
o2::tpc::CalArray<o2::tpc::PadFlags>is added to the dictionaries (Linkdef), the stacktrace changes and the crash becomes reproducible. In this case, there is an error writing beyond vector boundaries. - The next step is to try to reproduce the crash with a debug build of ROOT
Further debugging revealed a deeper issue that seem to only by chance surface on ARM/Linux:
Writing or reading a vector of enums goes through the collection proxy. The collection proxy will use WriteFastArray / ReadFastArray of kInt_t, neglecting the actual underlying type of the enum. At some point in the read/write chain, this causes memory reads/writes beyond the limits of a memory array.
I think the cause is https://github.com/root-project/root/blob/master/io/io/src/TGenCollectionProxy.cxx#L404 (and similar lines further down), that hard-code the enum underlying type to int.
When fixing, I think we need to take care of what happens to files already written out with the wrong enum width.
Do I understand correctly this affects only scoped enums within a vector? Can I simply fix it on my side by moving to enum class Foo : int {}?
Although: I'm not exactly sure if already existing files that were serialized with a shorter enum correctly read back. I think yes, but that needs to be tested.
Although: I'm not exactly sure if already existing files that were serialized with a shorter enum correctly read back. I think yes, but that needs to be tested.
This I can try on my side.
I'm attaching a minimal reproducer.
minimalTestVectorOfEnums.tar.gz
This test returns (wrongly)
Size of PadFlags: 2
Enum underlying type: 12
mFlags size before writing: 2
mFlags size after reading: 4
0 0 23824 0
With a patch to TGenCollectionProxy::Value, the result is correct:
Size of PadFlags: 2
Enum underlying type: 12
mFlags size before writing: 2
mFlags size after reading: 2
0 0
I think the next steps should be discussed with @pcanal. In particular:
- What about the cases when we only have an emulated enum? With this patch in place, we cannot just assume anymore that this will be an int on disk.
- In general, how do we correctly handle vectors of enums with underlying types different than int that are on disk, before and after the patch?
AFAICT, neither TTree nor RNTuple I/O are affected by this issue.
[1965517:tpc-tracker]: i= 2, mPadSubset type= 23, offset= 56, len=2, method=0 [optimized] as I would have expected it to be len=1. Can you explain me what is going on?
If the next data member (which should not be listed right after it) is of the same type, TStreamerInfo will collate them (note the optimized part).
We shall be able to fix the usage in regular I/O and TTree (which is also broken) when using dictionary. The proper support in bare ROOT might be harder (the underlying size information is a bit harder to find and in some case might not be (yet?) available (top level vector of enums)).
In general, how do we correctly handle vectors of enums with underlying types different than int that are on disk, before and after the patch?
With dictionaries, it seems to work fine (for embedded vectors probably not for standalone vector) because the TStreamerInfo of the containing class records the underlying type and thus know when a conversion is needed (The corollary is that a class version number must be updated (to allow schema evolution) if one of the enums type it uses changes its underlying type).
For the record, as you might have seen in https://github.com/AliceO2Group/AliceO2/pull/13464, simply changing the types breaks reading back old files (i.e. two shorts are read in an int). Could you comment when do you expect to have a fix for this on your side which applies to 6.32.2 and if it will allow old code to still read new data (and viceversa new code / old data)?
Side note for the record, the original valgrind report and crash happens in the case where the vector<EnumType> is itself held in a vector (of CalArray) held into an object (CalDet).
I have a workaround that solves the problem for the case in the minimal reproducer which resolves around setting a read rule for the vector of enums:
template <typename E>
void LoadEnumCollection(/* const */ std::vector<E> &onfile, std::vector<E> &enums)
{
constexpr size_t delta = sizeof(int)/sizeof(E);
const size_t nvalues = onfile.size() / delta;
onfile.resize(nvalues);
std::swap(onfile, enums);
};
#pragma read sourceClass="Event" checksums="[0xa2558fd6]" targetClass="Event" source="std::vector<PadFlags> mFlags" target="mFlags" code="{ LoadEnumCollection(onfile.mFlags, mFlags); }"
However it does not work yet for the actual/original problem :(. (In the minimal reproducer the size of the container is double what it should be has no over-write/crash, while in the original the container ends up with the right size but with an over-write and thus crash).
The following custom Streamer works around the issue:
template <typename Flags>
inline void CalArray<Flags>::Streamer(TBuffer &R__b)
{
// Stream an object of class CalArray<PadFlags>.
if (R__b.IsReading()) {
UInt_t R__s, R__c;
Version_t R__v = R__b.ReadVersion(&R__s, &R__c);
if (R__v <= 3) {
{
UInt_t start, count;
Version_t vers = R__b.ReadVersion(&start, &count);
std::vector<int> R__stl;
R__stl.clear();
int R__n;
R__b >> R__n;
R__stl.reserve(R__n);
for (int R__i = 0; R__i < R__n; R__i++) {
Int_t readtemp;
R__b >> readtemp;
R__stl.push_back(readtemp);
}
R__b.CheckByteCount(start, count, "stl collection of enums");
mFlags.clear();
auto data = reinterpret_cast<unsigned short*>(R__stl.data());
constexpr size_t delta = sizeof(int)/sizeof(Flags);
for(int i = 0; i < R__n; ++i)
mFlags.push_back(static_cast<PadFlags>( data[i] ));
}
int tmp;
R__b >> tmp;
mPadSubset = static_cast<PadSubset>(tmp);
R__b.CheckByteCount(R__s, R__c, CalArray::IsA());
} else {
R__b.ReadClassBuffer(CalArray<Flags>::Class(),this, R__v, R__s, R__c);
}
} else {
R__b.WriteClassBuffer(CalArray<Flags>::Class(),this);
}
}
[Call to ReadClassBuffer was corrected to add missing parameters]
Any followup to the bug itself? Will we have a fix in ROOT which avoids a custom streamer?
Any followup to the bug itself? Will we have a fix in ROOT which avoids a custom streamer?
Yes. https://github.com/root-project/root/pull/17009 solves the problem and files produced with those changes can be written and read without any customization. Reading files that were written prior to those changes and containing enum with non-default size, and thus were incorrectly written, will require explicit customization because the data layout in the file depends on what the enum size was at the time of written and this information is not recorded in the file and thus requires manual intervention.
@ktf The PR was merged in the master. Please let us know if you encounter any (new) problem.
Hi @pcanal, @jblomer, @dpiparo,
It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.
Sincerely, :robot: