ADIOS2
ADIOS2 copied to clipboard
Invalid read in SST writer with WAN backend and data volume larger than 2GB
Describe the bug I'm trying to use the WAN backend of SST to stream data on a system where no compatible libfabric backend is available. Up until 2GB per step, everything runs fine, after this I get a segfault in the writer.
Valgrind output:
==373784== Invalid read of size 1
==373784== at 0x9029A51: INT_CMwrite_raw_notify (cm.c:3122)
==373784== by 0x90298F3: INT_CMwrite_raw (cm.c:3093)
==373784== by 0x902B0C1: INT_CMwrite_attr (cm.c:3323)
==373784== by 0x9027861: INT_CMwrite (cm.c:2733)
==373784== by 0x903DA6B: CMwrite (cm_interface.c:629)
==373784== by 0x80CD7A3: SendSpeculativePreloadMsgs (evpath_dp.c:1299)
==373784== by 0x80CD3B4: EvpathWSReaderRegisterTimestep (evpath_dp.c:1211)
==373784== by 0x80D5AF6: SendTimestepEntryToSingleReader (cp_writer.c:1161)
==373784== by 0x80D5BA4: SendTimestepEntryToReaders (cp_writer.c:1181)
==373784== by 0x80D88E8: SstInternalProvideTimestep (cp_writer.c:2320)
==373784== by 0x80D8CFD: SstProvideTimestep (cp_writer.c:2409)
==373784== by 0x801EEA0: adios2::core::engine::SstWriter::EndStep() (SstWriter.cpp:360)
==373784== Address 0xffffffffd9c9d040 is not stack'd, malloc'd or (recently) free'd
==373784==
==373784==
==373784== Process terminating with default action of signal 11 (SIGSEGV)
==373784== at 0x5D94019: raise (raise.c:46)
==373784== by 0x5D940BF: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
==373784== by 0x9029A50: INT_CMwrite_raw_notify (cm.c:3122)
==373784== by 0x90298F3: INT_CMwrite_raw (cm.c:3093)
==373784== by 0x902B0C1: INT_CMwrite_attr (cm.c:3323)
==373784== by 0x9027861: INT_CMwrite (cm.c:2733)
==373784== by 0x903DA6B: CMwrite (cm_interface.c:629)
==373784== by 0x80CD7A3: SendSpeculativePreloadMsgs (evpath_dp.c:1299)
==373784== by 0x80CD3B4: EvpathWSReaderRegisterTimestep (evpath_dp.c:1211)
==373784== by 0x80D5AF6: SendTimestepEntryToSingleReader (cp_writer.c:1161)
==373784== by 0x80D5BA4: SendTimestepEntryToReaders (cp_writer.c:1181)
==373784== by 0x80D88E8: SstInternalProvideTimestep (cp_writer.c:2320)
GDB finds the segfault at the same line:
3119│ for (i=0; i < vec_count; i++) {
3120│ count += full_vec[i].iov_len - start;
3121│ for (j=start; j< full_vec[i].iov_len; j++) {
3122│ checksum += ((unsigned char*)full_vec[i].iov_base)[j]; // this crashes
3123│ }
3124│ start = 0;
3125│ }
The error seems to occur independently of the chosen WANDataTransport
.
To Reproduce
- Use a data producer in ADIOS2 that writes more than 2GB per step
- Use the SST engine with WAN backend
- Use any reader to connect, in my tests opening the stream will be sufficient for crashing without loading any data. The crash will only occur when connecting a reader.
If you need an ADIOS2-only reproducer, I can set up one, for now I'm seeing this issue via openPMD.
Expected behavior Ideally, no crash. The SST documentation does not mention a 2GB limit, but maybe it has one?
If it does, what is the recommended way to set up a streaming workflow with mediocre data volume in a non-HPC environment? The current use case is a lab environment for the exchange of laser images.
Desktop (please complete the following information):
- OS/Platform:
nvidia/cuda:11.6.0-devel-ubuntu20.04
Singularity container on a Debian bullseye/sid machine - Build [e.g. compiler version gcc 7.4.0, cmake version, build type: static ]: ADIOS 2.8.0, built with g++ 11.1.0 in Debug mode
Additional context I remember seeing this issue also with earlier ADIOS2 releases, but did not investigate. So I assume that the exact build type is not very relevant.
Following up Was the issue fixed? Please report back.
Interesting... My first thought was lurking 32-bit length values (much of this code was written with control, not data, in mind), but offhand I'm not sure how that turns into a segfault here. Then again, that checksum calculation should only be done for tiny (<10K) messages, so something is clearly going bad. Let me see if I can reproduce. Shouldn't be too hard to sort.
I was unable to duplicate this when I tried. EVPath has had a couple of tweaks since then. @franzpoeschel , can you maybe try again? If it still fails, I've probably got to get your setup so I can recreate.
Thank you for looking into this! I still see the crashes with v2.8.1, I will try to create a minimal example
I can reproduce this with a minimal ADIOS2 example:
Writer:
#include <adios2.h>
#include <numeric>
#include <vector>
int main(int argsc, char **argsv)
{
std::string engine_type = "sst";
std::string datatransport = "WAN";
if (argsc > 1)
{
datatransport = argsv[1];
}
adios2::ADIOS adios;
adios2::IO IO = adios.DeclareIO("IO");
IO.SetParameter("DataTransport", datatransport);
IO.SetEngine(engine_type);
adios2::Engine engine = IO.Open("stream", adios2::Mode::Write);
using datatype = double;
constexpr size_t vecLength = 2ull * 1024 * 1024 * 1024 / sizeof(double);
std::vector<datatype> streamData(vecLength);
std::iota(streamData.begin(), streamData.end(), 0.);
auto variable = IO.DefineVariable<datatype>(
"var", {vecLength}, {0}, {vecLength}, /* constantDims = */ true);
for (unsigned step = 0; step < 10; ++step)
{
engine.BeginStep();
engine.Put(variable, streamData.data());
engine.EndStep();
}
engine.Close();
}
Reader:
#include <adios2.h>
#include <iostream>
#include <string>
#include <vector>
int main(int argsc, char **argsv)
{
using datatype = double;
std::string engine_type = "sst";
std::string datatransport = "WAN";
if (argsc > 1)
{
datatransport = argsv[1];
}
adios2::ADIOS adios;
adios2::IO IO = adios.DeclareIO("IO");
IO.SetParameter("DataTransport", datatransport);
IO.SetEngine(engine_type);
adios2::Engine engine = IO.Open("stream", adios2::Mode::Read);
std::vector<datatype> streamData;
unsigned currentStep = 0;
auto loopbody = [&engine, &streamData, ¤tStep](
adios2::Variable<datatype> &variable) {
engine.Get(variable, streamData.data());
engine.EndStep();
std::cout << currentStep++ << std::endl;
};
engine.BeginStep();
auto variable = IO.InquireVariable<datatype>("var");
if (!variable)
{
throw std::runtime_error("[Reader] Failed inquiring variable");
}
streamData.resize(variable.Shape()[0]);
loopbody(variable);
while (engine.BeginStep() == adios2::StepStatus::OK)
{
loopbody(variable);
}
engine.Close();
}
cmake_minimum_required(VERSION 3.12.0)
project(adios_stream)
find_package(ADIOS2 REQUIRED)
add_executable(stream_write stream_write.cpp)
add_executable(stream_read stream_read.cpp)
target_link_libraries(stream_write PRIVATE adios2::cxx11)
target_link_libraries(stream_read PRIVATE adios2::cxx11)
Changing one of the 1024
to a 1023
will make things work completely fine, but the above configuration will not work.
The faulty behavior depends on the ADIOS2 version:
-
vecLength = 2ull * 1024 * 1024 * 1023 / sizeof(double)
works in both v2.8.2 and v2.7.1 -
2ull * 1024 * 1024 * 1024
will crash in v2.7.1 and v2.8.0 with the segfault at the checksum calculation as in the entry post, but will hang in v2.8.2 (1) -
4ull * 1024 * 1024 * 1024
will crash in v2.7.1, v2.8.0 and in v2.8.2, this time with a different error at the reading site, all the same (2)
(1) Hangup backtrace of the writer:
(gdb) backtrace
#0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55555557b930) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55555557b8e0, cond=0x55555557b908) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x55555557b908, mutex=0x55555557b8e0) at pthread_cond_wait.c:647
#3 0x00007ffff75097ff in SstWriterClose (Stream=0x55555557b820) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/cp/cp_writer.c:1578
#4 0x00007ffff7451a7a in adios2::core::engine::SstWriter::DoClose (this=0x55555557b6e0, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstWriter.cpp:40
4
#5 0x00007ffff6f45b2b in adios2::core::Engine::Close (this=0x55555557b6e0, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/core/Engine.cpp:70
#6 0x00007ffff7e2524b in adios2::Engine::Close (this=0x7fffffff4d88, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/bindings/CXX11/adios2/cxx11/Engine.cpp:115
#7 0x0000555555559b70 in main ()
Hangup backtrace of the reader:
(gdb) backtrace
#0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5555555f3ef0) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55555557c600, cond=0x5555555f3ec8) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x5555555f3ec8, mutex=0x55555557c600) at pthread_cond_wait.c:647
#3 0x00007ffff5b911b7 in INT_CMCondition_wait (cm=0x55555557c590, condition=3) at /home/franzpoeschel/git-repos/ADIOS2/thirdparty/EVPath/EVPath/cm_control.c:299
#4 0x00007ffff5b9e69e in CMCondition_wait (cm=0x55555557c590, condition=3) at /home/franzpoeschel/singularity_build/ADIOS2_build/thirdparty/EVPath/EVPath/cm_interface.c:85
#5 0x00007ffff74ffcaf in EvpathWaitForCompletion (Svcs=0x7ffff77980c0 <Svcs>, Handle_v=0x5555555f3e60) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c:1105
#6 0x00007ffff7505ee9 in SstWaitForCompletion (Stream=0x55555557a860, handle=0x5555555f3e60) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/cp/cp_reader.c:2296
#7 0x00007ffff7432ce5 in adios2::core::engine::SstReader::PerformGets (this=0x55555557a6e0) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstReader.cpp:715
#8 0x00007ffff7427ff8 in adios2::core::engine::SstReader::EndStep (this=0x55555557a6e0) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstReader.cpp:477
#9 0x00007ffff7e25085 in adios2::Engine::EndStep (this=0x7fffffff4df0) at /home/franzpoeschel/git-repos/ADIOS2/bindings/CXX11/adios2/cxx11/Engine.cpp:103
#10 0x0000555555558768 in main::{lambda(adios2::Variable<double>&)#1}::operator()(adios2::Variable<double>&) const ()
#11 0x0000555555558b84 in main ()
(2) Crash behavior: Segfault on reader:
547│ size_t Stride = Size / 8;
548│ unsigned long Print = 0;
549│ if (!Page)
550│ return 0;
551│ for (int i = 0; i < 8; i++)
552│ {
553│ size_t Index = Start + Stride * i;
554│ unsigned char Component = 0;
555│ while ((Page[Index] == 0) && (Index < (Size - 1)))
556│ {
557│ Component++;
558│ Index++;
559│ }
560│ Component += (unsigned char)Page[Index];
561│ Print |= (((unsigned long)Component) << (8 * i));
562│ }
/home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c
0x00007ffff74fe7e6 in writeBlockFingerprint (Page=0x7ffff0005d30 "\224", Size=4294967679) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c:555 [60/60]
(gdb) p Index
$1 = 268435479
(gdb) p Size
$2 = 4294967679
(gdb) p Component
$3 = 0 '\000'
(gdb) (gdb) p Page
$4 = 0x7ffff0005d30 "\224"
The writer only sends a warning after this Writer 0 (0x55555557b820): Got an unexpected connection close event
.
The environment is a nvidia/cuda:11.6.0-devel-ubuntu20.04
Singularity container with a g++ 11.1.0
ADIOS2 Debug build.
The issues are reproducible for v2.8.2 on my local system (openSUSE Leap 15.4), so they seem to not be entirely system-dependent. Setting IO.SetParameter("QueueLimit", "1");
helps reproduce the hangup on a system with limited RAM.
Didn't try any other ADIOS2 versions locally.
Thanks. Let me see what I can do. (I'm currently isolating with an active CoViD infection, so I'm not exactly on top of my game, but I'm not completely non-functional.)
As said before, this is not urgent. Wishing you a quick recovery!
Just a thought while I'm poking at this. The example reader code is subtly wrong. For streaming engines, variables get wiped (at least potentially), so you have to do the InqVar again inside the loop. (There are a lot more problems supporting >2Gb data blocks inside BP5 than this, but this did cause undefined results because the var (and the start/count blocks that it contained) were free'd on the next BeginStep() and had random values.)
Ah, thanks for the hint. I should check if our implementation in openPMD does this correctly, then.
EDIT: Yep, we do InquireVariable
before every dataset read
Looking at this in the background. Part of the problem is that there's actually a linux limitation too. Even on 64-bit systems, a single IO operation is limited to MAX_RW_COUNT bytes, where : #define MAX_RW_COUNT (INT_MAX & PAGE_MASK) and that works out to 0x7ffff000 (2,147,479,552). So, we can pass 64-bit sizes all the way through the system, but then get hung up when the lowest levels of EVPath network handling (actually the cmsockets transport) fail because they've been written to assume that you could submit a single writev() or read() operation that sends or reads a message.
The upshot is that I have mods that I'll commit, but they don't yet completely solve this problem because of the MAX_RW_COUNT issue. Doing the final fix is a bit complicated because of how it interacts with existing support for async write() operations in EVPath.
I already got the impression that going beyond 2GB will run into limitations at many different corners, given the variety of errors that I had. So, part of this issue is probably the question whether such large values are supposed to be supported by the WAN backend at all, or if I should look into other engines such as Dataman. If you think this hurdle can be overcome, then that's good news to me however, since our support for SST is definitely more mature than that for other engines.
I think this can be overcome. It's just going to take a bit of experimentation. The straightforward approach I tried when I first discovered it just had some rather disastrous, so I need to sort through alternatives...