ADIOS2
ADIOS2 copied to clipboard
SegFault in BP5Serializer::CollectFinalShapeValues()
ADIOS2 (2.9.0 and latest) segfaults in BP5Serializer::CollectFinalShapeValues() when using SST in combination with SENSEI (latest).
[jwb0021:23855:0:23855] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[jwb0021:23854:0:23854] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 12759) ====
0 0x0000000000012cf0 __funlockfile() :0
1 0x00000000000d0057 __memmove_avx_unaligned_erms() :0
2 0x00000000005a71c6 adios2::format::BP5Serializer::CollectFinalShapeValues() ???:0
3 0x00000000005a90e4 adios2::format::BP5Serializer::CloseTimestep() ???:0
4 0x00000000006827ff adios2::core::engine::SstWriter::EndStep() ???:0
5 0x000000000002939a adios2_end_step() ???:0
6 0x00000000001257fc sensei::ADIOS2AnalysisAdaptor::WriteTimestep() ???:0
7 0x000000000012bab6 sensei::ADIOS2AnalysisAdaptor::Execute() ???:0
8 0x000000000000c6c3 sensei::ConfigurableAnalysis::Execute() ???:0
9 0x000000000030cb2b sensei_bridge_update() ???:0
I assume this happens at THIS code line.
Thanks. Probably we need more detail from the variable declarations, any calls to SetShape, etc. If the variable is classified as a GlobalArray, the Shape value presumably was set at some point. The question is what happened to it? Was it reset somehow? Was the variable destroyed? Maybe you can point me to the source somewhere?
Hi @eisenhauer,
thank you for your reply. I have created a small demo -> adios2_segfaultExample.tar.gz showing the segfault based on this SENSEI-miniapp from HERE
You can find
- stdout+stderr of the simulation code:
sensei_oscillator.segfault/srun-7939151.sim
- which shows the segfault - slurm script:
sensei_oscillator.segfault/oscillator.slurm
- 2 nodes are running the simulation and 1 node the SENSEI endpoint - build-configs for ADIOS2 and SENSEI:
configs/
- I tried to reduce the number of "special" settings
Let me know if this is helpful and if any further information would be useful.
Hmm. I haven't looked at Sensei source before. C++ code using the ADIOS C interfaces. That complicates things a bit and at least my initial look at Sensei, I don't have a good guess as to what might be going on. The ADIOS change that seems to be implicated here, grabbing the Shape of a variable at EndStep rather than at Put so that it can be changed after the Put, shouldn't have an impact to anything I see, but the logic is complex enough that I can't be completely sure of what is going on just by inspection. Unfortunately that probably means trying to reproduce this somewhere I can either examine a core dump or add additional diagnostics, which may not happen right away. Will do what I can though.
I tried to find the commit which introduced this issue and went back to the 14th of March for now.
BP5Serializer::CollectFinalShapeValues()
is not yet introduced but it still segfault at a different position:
[jwb0021:11225:0:11225] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1892a660)
[jwb0021:11222:0:11222] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x92f74a0)
[jwb0021:11223:0:11223] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe08c370)
[jwb0021:11224:0:11224] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x148c1910)
[jwb0033:23160:0:23160] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x14ef6fd0)
[jwb0033:23158:0:23158] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x11461ab0)
[jwb0033:23161:0:23161] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1ca7b920)
[jwb0033:23159:0:23159] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x118dcee0)
==== backtrace (tid: 23161) ====
0 0x0000000000012cf0 __funlockfile() :0
1 0x00000000000d01f8 __memmove_avx_unaligned_erms() :0
2 0x000000000000e2b4 copy_data_to_tmp() ???:0
3 0x0000000000010bd5 handle_subfield() ffs.c:0
4 0x000000000001067d handle_subfield() ffs.c:0
5 0x00000000000112eb FFSencode_internal() ffs.c:0
6 0x0000000000599b37 adios2::format::BP5Serializer::CloseTimestep() ???:0
7 0x00000000006629cf adios2::core::engine::SstWriter::EndStep() ???:0
8 0x00000000000281ba adios2_end_step() ???:0
9 0x00000000001257fc sensei::ADIOS2AnalysisAdaptor::WriteTimestep() ???:0
10 0x000000000012bab6 sensei::ADIOS2AnalysisAdaptor::Execute() ???:0
11 0x000000000000c6c3 sensei::ConfigurableAnalysis::Execute() ???:0
12 0x000000000044138f bridge::execute() ???:0
13 0x0000000000410042 main() ???:0
14 0x000000000003ad85 __libc_start_main() ???:0
15 0x0000000000410ebe _start() ???:0
I will go back in time a bit further to older commits tomorrow ...
OK, so that puts a different spin on things. I don't recall exactly when we made BP5 the default serializer for SST, but I suspect that if this code worked on a prior version of ADIOS then perhaps it was using the older "bp" marshalling method. You can see if that works by setting the engine parameter "MarshalMethod" to a value of "bp" (even with the newest ADIOS). If it does, that may narrow down the problem.
You are right! With MarshalMethod = BP
the segfault is gone.
So you have a workaround for the moment. By and large, BP4 operates on metadata provided to it (shape, start, count arrays) at the moment of Put(), but BP5 gains efficiency through bulk processing in EndStep. In looking at Sensei code, it appears that the metadata arrays are often stack-allocated at the time of Put() and EndStep doesn't appear in the same subroutine, so it's a pretty good guess that somehow the deallocation of those arrays is tied to what's going on. We still have to sort out whether or not this is something that happens only when going through the C bindings or not, and how best to fix it. When you call things like adios_set_selection and adios_set_shape, you pass in the address of metadata arrays in application space, but I'm not sure we're clear on the requirements for how long that metadata should persist, if ADIOS commits to copying it when provided, etc. I think I can work from the Sensei code in ADIOS2Schema.cpp to replicate the issue in some test code to sort out exactly what's going on and where to go from here. It'll likely be a few days though.
If were easy to get a core dump file of the original failure in CollectFinalShapeValues and print VB->m_Name, that might help narrow down exactly which usage was problematic...
I have run the example with debug flags and core dumps enabled. Here it segfaults with
==== backtrace (tid: 4005) ====
0 0x0000000000012cf0 __funlockfile() :0
1 0x00000000000d0057 __memmove_avx_unaligned_erms() :0
2 0x0000000000ce057b adios2::format::BP5Serializer::CollectFinalShapeValues() /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/ADIOS2-53acb22f0ed88b43a6bd6ca841aa6e1672a1d995/source/adios2/toolkit/format/bp5/BP5Serializer.cpp:1153
3 0x0000000000ce0fb5 adios2::format::BP5Serializer::CloseTimestep() /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/ADIOS2-53acb22f0ed88b43a6bd6ca841aa6e1672a1d995/source/adios2/toolkit/format/bp5/BP5Serializer.cpp:1270
4 0x0000000000dd74ed adios2::core::engine::SstWriter::EndStep() /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/ADIOS2-53acb22f0ed88b43a6bd6ca841aa6e1672a1d995/source/adios2/engine/sst/SstWriter.cpp:308
5 0x00000000000614b4 adios2_end_step() /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/ADIOS2-53acb22f0ed88b43a6bd6ca841aa6e1672a1d995/bindings/C/adios2/c/adios2_c_engine.cpp:563
6 0x00000000002792f8 sensei::ADIOS2AnalysisAdaptor::WriteTimestep() /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/sensei/ADIOS2AnalysisAdaptor.cxx:522
7 0x000000000027591a sensei::ADIOS2AnalysisAdaptor::Execute() /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/sensei/ADIOS2AnalysisAdaptor.cxx:238
8 0x000000000001c987 sensei::ConfigurableAnalysis::Execute() /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/sensei/ConfigurableAnalysis.cxx:1555
9 0x000000000048f949 bridge::execute() /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/miniapps/oscillators/bridge.cpp:70
10 0x0000000000442f31 main() /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/miniapps/oscillators/main.cpp:302
11 0x000000000003ad85 __libc_start_main() ???:0
12 0x0000000000436d2e _start() ???:0
=================================
Klick to see the full backtrace from the core file
Here is the output of a `gbt(gdb) bt full
#0 0x0000152b07978057 in __memmove_avx_unaligned_erms () from /usr/lib64/libc.so.6
No symbol table info available.
#1 0x0000152b0723157b in adios2::format::BP5Serializer::CollectFinalShapeValues (this=0xac9010) at /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/easybuild_obj/source/adios2/BP5Serializer.h:1153
VB = 0x9c6410
MBase = 0xefc9a0
AlreadyWritten = 1
MetaEntry = 0xefcab8
Rec = 0xefc5a0
i = 8
#2 0x0000152b07231fb5 in adios2::format::BP5Serializer::CloseTimestep (this=0xac9010, timestep=2, forceCopyDeferred=false)
at /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/easybuild_obj/source/adios2/BP5Serializer.h:1270
Formats = std::vector of length 0, capacity 0
MetaEncodeBuffer = 0xef7290
AttributeEncodeBuffer = 0x0
MetaDataSize = 0
AttributeSize = 0
MBase = 0xefc9a0
MetaDataBlock = 0x0
Metadata = 0x152b07336fdc <std::__uniq_ptr_impl<adios2::format::BP5Serializer, std::default_delete<adios2::format::BP5Serializer> >::_M_ptr() const+24>
AttrData = 0x7fff7fa71ee0
tmp = 0xe91940
Ret = {NewMetaMetaBlocks = std::vector of length -476753, capacity 248678129754440370 = {{MetaMetaInfo = 0xef9ec0 "", MetaMetaInfoLen = 28197312,
MetaMetaID = 0x1300000049 <error: Cannot access memory at address 0x1300000049>, MetaMetaIDLen = 18446744073709551615}, {MetaMetaInfo = 0xb0000003a <error: Cannot access memory at address 0xb0000003a>,
MetaMetaInfoLen = 18446744073709551615, MetaMetaID = 0x50000003b <error: Cannot access memory at address 0x50000003b>, MetaMetaIDLen = 18446744073709551615}, {
MetaMetaInfo = 0x50 <error: Cannot access memory at address 0x50>, MetaMetaInfoLen = 192, MetaMetaID = 0xf09f10 "sockets", MetaMetaIDLen = 14110736}, {MetaMetaInfo = 0xf09f60 "", MetaMetaInfoLen = 23274375906086,
MetaMetaID = 0x152afce8c469 <CMWriteQueuedData> "UH\211\345SH\201\354", <incomplete sequence \330>, MetaMetaIDLen = 23274083102355}, {
MetaMetaInfo = 0x152aeb749934 <libcmsockets_LTX_non_blocking_listen> "UH\211\345SH\201\354\250\001", MetaMetaInfoLen = 23274083095216, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {
MetaMetaInfo = 0x152aeb749456 <libcmsockets_LTX_self_check> "UH\211\345H\201\354@\001", MetaMetaInfoLen = 23274083096400,
MetaMetaID = 0x152aeb748aab <libcmsockets_LTX_shutdown_conn> "UH\211\345H\203\354\020H\211}\370H\211u\360H\213E\370H\213\220", <incomplete sequence \320>, MetaMetaIDLen = 23274083099786}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 23274083100744, MetaMetaID = 0x152aeb74ab58 <libcmsockets_LTX_NBwritev_func> "UH\211\345SH\201", <incomplete sequence \354\210>, MetaMetaIDLen = 0}, {
MetaMetaInfo = 0x152aeb74a2bb <libcmsockets_LTX_set_write_notify> "UH\211\345H\203\354 H\211}\370H\211u\360H\211U\350\211M\344\203", <incomplete sequence \344>, MetaMetaInfoLen = 15763536,
MetaMetaID = 0x152aeb74b091 <libcmsockets_LTX_get_transport_characteristics> "UH\211\345H\203\354\060H\211}\350H\211u\340H\211U\330H\213E\330H\211E\370H\213E\370H\213@(H\211\307\350\303\320\377\377H\213E\370H\213@(\311\303UH\211\345H\203\354 H\211}\350H\211u\340H\213E\340H\213", MetaMetaIDLen = 0}, {MetaMetaInfo = 0x6f732e <error: Cannot access memory at address 0x6f732e>, MetaMetaInfoLen = 33,
MetaMetaID = 0x6 <error: Cannot access memory at address 0x6>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x61006e696769 <error: Cannot access memory at address 0x61006e696769>, MetaMetaInfoLen = 593,
MetaMetaID = 0x152aeb7700e0 <mca_coll_basic_module_t_class> "\305\320v\353*\025", MetaMetaIDLen = 12}, {MetaMetaInfo = 0x152aeb766cb0 <mca_coll_basic_module_enable> "ATUSI\211\374H\213\035ڒ",
MetaMetaInfoLen = 23274597105344, MetaMetaID = 0x152b0a17c950 <ompi_coll_base_allgatherv_intra_basic_default> "AWAVAUM\211\307ATUSH\211\315H\203\354(H\213\\$hL\213l$`H\213\203", <incomplete sequence \370>,
MetaMetaIDLen = 23274083208016}, {MetaMetaInfo = 0x152b0a181240 <ompi_coll_base_alltoall_intra_basic_linear> "AWAVAUA\211\362ATUSD\211\306H\203\354hM\211\317L\213\204$\250", MetaMetaInfoLen = 23274597132464,
MetaMetaID = 0x152aeb765a90 <mca_coll_basic_alltoallw_intra> "AWAVAUATUSL\211\315H\203\354XH\213\204$\220", MetaMetaIDLen = 23274083213600}, {
MetaMetaInfo = 0x152aeb766310 <mca_coll_basic_bcast_log_intra> "AWAVAUA\211\366ATUSL\211\315H\203\354HI\213\200", <incomplete sequence \370>, MetaMetaInfoLen = 23274083238672,
MetaMetaID = 0x152b0a182120 <ompi_coll_base_gather_intra_basic_linear> "AWAVAUATUSH\203\354\070L\213l$xD\213d$pH\211|$\030\211t$$H\211T$(D\211D$ I\213\205", <incomplete sequence \370>, MetaMetaIDLen = 23274083215568},
{MetaMetaInfo = 0x152aeb76a350 <mca_coll_basic_reduce_log_intra> "AWAVAUM\211\306ATUSLc\352H\203\354xI\211\314H\213\204$\260", MetaMetaInfoLen = 23274083233296,
MetaMetaID = 0x152aeb76c020 <mca_coll_basic_reduce_scatter_block_intra> "雀\377\377ff.\017\037\204", MetaMetaIDLen = 23274083238656}, {
MetaMetaInfo = 0x152b0a17a2c0 <ompi_coll_base_scatter_intra_basic_linear> "AWAVAUI\211\375ATUSH\211\317H\203\354(H\213\\$h\211t$\020H\211L$\b\213L$`D\211D$\024L\211L$\030H\213\203", <incomplete sequence \370>,
MetaMetaInfoLen = 23274083239232, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0,
MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0,
MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x152aeb767070 <mca_coll_basic_neighbor_allgather> "AWAVAUATUSH\203\354xL\213\254$\260",
MetaMetaInfoLen = 23274083219680, MetaMetaID = 0x152aeb768140 <mca_coll_basic_neighbor_alltoall> "AWAVAUATUSH\201", <incomplete sequence \354\230>, MetaMetaIDLen = 23274083224704}, {
MetaMetaInfo = 0x152aeb7697b0 <mca_coll_basic_neighbor_alltoallw> "AWAVAUATUSH\203\354xH\213\234$\300", MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 23274083216544}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 23274597133792,
MetaMetaID = 0xe8cf70 "\240\326\036\n+\025", MetaMetaIDLen = 81}, {MetaMetaInfo = 0x152afdd48360 <opal_hash_table_t_class> "r \322\375*\025", MetaMetaInfoLen = 1, MetaMetaID = 0xf01540 "\001", MetaMetaIDLen = 31}, {
MetaMetaInfo = 0x6 <error: Cannot access memory at address 0x6>, MetaMetaInfoLen = 15, MetaMetaID = 0x200000001 "", MetaMetaIDLen = 4294967298}, {MetaMetaInfo = 0x152afdd440a0 <opal_hash_type_methods_uint32> "",
MetaMetaInfoLen = 369, MetaMetaID = 0x152b0a1e2240 <ompi_communicator_t_class> "HA\033\n+\025", MetaMetaIDLen = 23274427777025}, {MetaMetaInfo = 0x152afdd48360 <opal_hash_table_t_class> "r \322\375*\025",
MetaMetaInfoLen = 1, MetaMetaID = 0xe8d150 "", MetaMetaIDLen = 31}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 15, MetaMetaID = 0x200000001 "", MetaMetaIDLen = 4294967298}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 15288720,
MetaMetaID = 0x152afdd48be0 <opal_mutex_t_class> "\325K\322\375*\025", MetaMetaIDLen = 1}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x4d4d4f432049504d <error: Cannot access memory at address 0x4d4d4f432049504d>, MetaMetaIDLen = 5931051873548717653}, {
MetaMetaInfo = 0x4620505544203820 <error: Cannot access memory at address 0x4620505544203820>, MetaMetaInfoLen = 236765138770, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x600000008 <error: Cannot access memory at address 0x600000008>, MetaMetaIDLen = 4096}, {MetaMetaInfo = 0xffff8002ffff8002 <error: Cannot access memory at address 0xffff8002ffff8002>,
MetaMetaInfoLen = 12203776, MetaMetaID = 0xba3700 "@&\036\n+\025", MetaMetaIDLen = 0}, {MetaMetaInfo = 0xe94a00 "`\203\324\375*\025", MetaMetaInfoLen = 3, MetaMetaID = 0x0, MetaMetaIDLen = 8}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 23274597592608, MetaMetaID = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0xe8dc10 "p+y\353*\025", MetaMetaInfoLen = 4294967266, MetaMetaID = 0x0,
MetaMetaIDLen = 161}, {MetaMetaInfo = 0x152b0a1ed6a0 <mca_coll_base_comm_t_class> "\222\322\033\n+\025", MetaMetaInfoLen = 1, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {
MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 209}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 4294967296,
MetaMetaID = 0x736d636200000000 <error: Cannot access memory at address 0x736d636200000000>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 4294967300,
MetaMetaID = 0x65686373 <error: Cannot access memory at address 0x65686373>, MetaMetaIDLen = 7669474516593028864}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 4294967296,
MetaMetaID = 0x3032656700000000 <error: Cannot access memory at address 0x3032656700000000>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 4294967300,
--Type <RET> for more, q to quit, c to continue without paging--
t address 0x1732f7265>, MetaMetaIDLen = 5713172649054925409}, {MetaMetaInfo = 0x100000000 <error: Cannot access memory at address 0x100000000>, MetaMetaInfoLen = 4294967296, MetaMetaID = 0x642d613200000001 <error: Cannot access memory at address 0x642d613200000001>, MetaMetaIDLen = 15796656}, {
MetaMetaInfo = 0xf10230 "\320", <incomplete sequence \361>, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 113, MetaMetaID = 0x700000002 <error: Cannot access memory at address 0x700000002>, MetaMetaIDLen = 34359738368}, {MetaMetaInfo = 0x2 <error: Cannot access memory at address 0x2>,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 0, MetaMetaID = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0xa00000000 <error: Cannot access memory at address 0xa00000000>,
MetaMetaInfoLen = 15764144, MetaMetaID = 0x0, MetaMetaIDLen = 1009}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0},
{MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0},
{MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 273, MetaMetaID = 0x4ee9e0 <ompi_mpi_comm_world> "@\"\036\n+\025", MetaMetaIDLen = 23274597588224}, {MetaMetaInfo = 0x4ef3e0 <ompi_mpi_comm_null> "@\"\036\n+\025", MetaMetaInfoLen = 13073376,
MetaMetaID = 0xbc7b70 "@\"\036\n+\025", MetaMetaIDLen = 11813696}, {MetaMetaInfo = 0xac7d80 "@\"\036\n+\025", MetaMetaInfoLen = 15276352, MetaMetaID = 0xe8ce00 "@\"\036\n+\025", MetaMetaIDLen = 15732704}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0,
MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 1009}, {
MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 15279136}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 1, MetaMetaID = 0x0, MetaMetaIDLen = 15425248}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 2,
MetaMetaID = 0x0, MetaMetaIDLen = 15290224}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 3, MetaMetaID = 0x0, MetaMetaIDLen = 15252288}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 4, MetaMetaID = 0x0, MetaMetaIDLen = 15274672}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaInfoLen = 6, MetaMetaID = 0x0, MetaMetaIDLen = 15274720}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0},
{MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0,
MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0},
{MetaMetaInfo = 0x0, MetaMetaInfoLen = 129, MetaMetaID = 0x6d68732f7665642f <error: Cannot access memory at address 0x6d68732f7665642f>, MetaMetaIDLen = 8243102867719677743}, {MetaMetaInfo = 0x6c6577756a2f3174 <error: Cannot access memory at address 0x6c6577756a2f3174>, MetaMetaInfoLen = 8243122732111192691,
MetaMetaID = 0x2f32534f4944412f <error: Cannot access memory at address 0x2f32534f4944412f>, MetaMetaIDLen = 3472897843301330994}, {MetaMetaInfo = 0x30322d73736f662f <error: Cannot access memory at address 0x30322d73736f662f>, MetaMetaInfoLen = 8458434531087692338,
MetaMetaID = 0x7562797361652f67 <error: Cannot access memory at address 0x7562797361652f67>, MetaMetaIDLen = 3416651497795251305}, {MetaMetaInfo = 0x7261706472696874 <error: Cannot access memory at address 0x7261706472696874>, MetaMetaInfoLen = 8386072312598722932,
MetaMetaID = 0x6874615056452f68 <error: Cannot access memory at address 0x6874615056452f68>, MetaMetaIDLen = 57408183954479}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 49, MetaMetaID = 0x5f395f345f475042 <error: Cannot access memory at address 0x5f395f345f475042>, MetaMetaIDLen = 7665811971185598820}, {
MetaMetaInfo = 0x61642f305f746365 <error: Cannot access memory at address 0x61642f305f746365>, MetaMetaInfoLen = 8746397786915692916, MetaMetaID = 0x617461642f305f <error: Cannot access memory at address 0x617461642f305f>, MetaMetaIDLen = 289}, {MetaMetaInfo = 0xe924a0 "", MetaMetaInfoLen = 8114192,
MetaMetaID = 0xe8da9800000000 <error: Cannot access memory at address 0xe8da9800000000>, MetaMetaIDLen = 65668057091014656}, {MetaMetaInfo = 0x4ee6c000000000 <error: Cannot access memory at address 0x4ee6c000000000>, MetaMetaInfoLen = 22208760491540480, MetaMetaID = 0x100000000 <error: Cannot access memory at address 0x100000000>,
MetaMetaIDLen = 67553994410557440}, {MetaMetaInfo = 0x10100000000 <error: Cannot access memory at address 0x10100000000>, MetaMetaInfoLen = 256, MetaMetaID = 0x100 <error: Cannot access memory at address 0x100>, MetaMetaIDLen = 3906639872}, {MetaMetaInfo = 0x4ee6c000 <error: Cannot access memory at address 0x4ee6c000>, MetaMetaInfoLen = 7700487995392,
MetaMetaID = 0x152aeb81760000 <error: Cannot access memory at address 0x152aeb81760000>, MetaMetaIDLen = 562949953487104}, {MetaMetaInfo = 0x10000 <error: Cannot access memory at address 0x10000>, MetaMetaInfoLen = 15751339696728309760, MetaMetaID = 0x3e00000000000e8 <error: Cannot access memory at address 0x3e00000000000e8>,
MetaMetaIDLen = 16627289824607013408}, {MetaMetaInfo = 0x100000000004e <error: Cannot access memory at address 0x100000000004e>, MetaMetaInfoLen = 144396663052566528, MetaMetaID = 0x0, MetaMetaIDLen = 16777216}, {MetaMetaInfo = 0xe8da98000000 <error: Cannot access memory at address 0xe8da98000000>, MetaMetaInfoLen = 86752970670080,
MetaMetaID = 0x4000000 "EESA_", MetaMetaIDLen = 16777216}, {MetaMetaInfo = 0x1000000 "", MetaMetaInfoLen = 256025550651392, MetaMetaID = 0x4ee6c0000000 <error: Cannot access memory at address 0x4ee6c0000000>, MetaMetaIDLen = 144115188075855872}, {MetaMetaInfo = 0x2aeb816600000000 <error: Cannot access memory at address 0x2aeb816600000000>,
MetaMetaInfoLen = 16777237, MetaMetaID = 0x0, MetaMetaIDLen = 1089}, {MetaMetaInfo = 0x152aeb792b70 <ompi_coll_tuned_allgather_intra_dec_fixed> "USH\203\354\bL\213\\$ H\203\377\001t\177H\213j\030I\213\203", <incomplete sequence \370>, MetaMetaInfoLen = 15264128,
MetaMetaID = 0x152aeb792dd0 <ompi_coll_tuned_allgatherv_intra_dec_fixed> "L\215T$\bH\203\344\340I\211\323A\377r\370UH\211\345AWAVAUATARSA\211\364I\213Z\bM\213\062I\211\315M\213R\020H\213\203", <incomplete sequence \370>, MetaMetaIDLen = 15264128}, {MetaMetaInfo = 0x152af800be70 <mca_coll_cuda_allreduce> "AWAVAUHc\302ATUSI\211\375H\203\354\070H\203y\030",
MetaMetaInfoLen = 15262448, MetaMetaID = 0x152aeb791b60 <ompi_coll_tuned_alltoall_intra_dec_fixed> "SL\213\\$\020I\211\322I\213\203", <incomplete sequence \370>, MetaMetaIDLen = 15264128}, {MetaMetaInfo = 0x152aeb791f90 <ompi_coll_tuned_alltoallv_intra_dec_fixed> "H\203\354\bL\213\\$ I\213\203", <incomplete sequence \370>, MetaMetaInfoLen = 15264128,
MetaMetaID = 0x152aeb765a90 <mca_coll_basic_alltoallw_intra> "AWAVAUATUSL\211\315H\203\354XH\213\204$\220", MetaMetaIDLen = 15256416}, {MetaMetaInfo = 0x152aeb792010 <ompi_coll_tuned_barrier_intra_dec_fixed> "H\213\207", <incomplete sequence \370>, MetaMetaInfoLen = 15264128,
MetaMetaID = 0x152aeb792080 <ompi_coll_tuned_bcast_intra_dec_fixed> "H\203\354\bI\213\200", <incomplete sequence \370>, MetaMetaIDLen = 15264128}, {MetaMetaInfo = 0x152af800c650 <mca_coll_cuda_exscan> "AWAVAUHc\302ATUSI\211\375H\203\354\070H\203y\030", MetaMetaInfoLen = 15262448,
MetaMetaID = 0x152aeb793170 <ompi_coll_tuned_gather_intra_dec_fixed> "USI\211\312H\203\354\bL\213\\$(\213\\$ I\213\203", <incomplete sequence \370>, MetaMetaIDLen = 15264128}, {MetaMetaInfo = 0x152aeb7668d0 <mca_coll_basic_gatherv_intra> "AWAVAUATUSH\203\354\070L\213\224$\200", MetaMetaInfoLen = 15256416,
MetaMetaID = 0x152af800bb10 <mca_coll_cuda_reduce> "AWAVAUI\211\367ATUSH\211\375H\203\354\070H\203y\030", MetaMetaIDLen = 15262448}, {MetaMetaInfo = 0x152aeb792600 <ompi_coll_tuned_reduce_scatter_intra_dec_fixed> "L\215T$\bI\213\201", <incomplete sequence \370>, MetaMetaInfoLen = 15264128,
MetaMetaID = 0x152af800c0e0 <mca_coll_cuda_reduce_scatter_block> "AWAVAUA\211\326ATUSI\211\375H\203\354\070H\203y\030", MetaMetaIDLen = 15262448}, {MetaMetaInfo = 0x152af800c3e0 <mca_coll_cuda_scan> "AWAVAUHc\302ATUSI\211\375H\203\354\070H\203y\030", MetaMetaInfoLen = 15262448,
MetaMetaID = 0x152aeb7932b0 <ompi_coll_tuned_scatter_intra_dec_fixed> "SL\213\\$\030I\211ҋ\\$\020I\213\203", <incomplete sequence \370>, MetaMetaIDLen = 15264128}, {MetaMetaInfo = 0x152aeb76c540 <mca_coll_basic_scatterv_intra> "AWAVAUI\211\325ATUSH\211\312H\203\354\070H\213D$pL\213\234$\200", MetaMetaInfoLen = 15256416,
MetaMetaID = 0x152aeb814f20 <ompi_coll_libnbc_iallgather> "SH\203\354\020H\213\\$(j", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb815a20 <ompi_coll_libnbc_iallgatherv> "SH\203\354\030H\213\\$8j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb817950 <ompi_coll_libnbc_iallreduce> "SH\203\354\030H\213\\$(j", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb818a20 <ompi_coll_libnbc_ialltoall> "SH\203\354\020H\213\\$(j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb8198b0 <ompi_coll_libnbc_ialltoallv> "SH\203\354\020H\213\\$8j", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81a630 <ompi_coll_libnbc_ialltoallw> "SH\203\354\020H\213\\$8j", MetaMetaInfoLen = 15261776,
MetaMetaID = 0x152aeb81b010 <ompi_coll_libnbc_ibarrier> "S1\311H\211\363H\203\354\020\350\021\374\377\377\205\300u\fH\213;\350\005A\377\377\205\300u\021H\203\304\020[\303ff.\017\037\204", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81bdd0 <ompi_coll_libnbc_ibcast> "SL\211\313H\203\354\020j", MetaMetaInfoLen = 15261776,
MetaMetaID = 0x152aeb81c760 <ompi_coll_libnbc_iexscan> "SH\203\354\030H\213\\$(j", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81d020 <ompi_coll_libnbc_igather> "SH\203\354\030H\213\\$8j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81d970 <ompi_coll_libnbc_igatherv> "SH\203\354\020H\213\\$8j", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb821010 <ompi_coll_libnbc_ireduce> "SH\203\354\020H\213\\$(j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb822610 <ompi_coll_libnbc_ireduce_scatter> "SH\203\354\030H\213\\$(j", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb8239f0 <ompi_coll_libnbc_ireduce_scatter_block> "SH\203\354\030H\213\\$(j",
MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb8242a0 <ompi_coll_libnbc_iscan> "SH\203\354\030H\213\\$(j", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb824b90 <ompi_coll_libnbc_iscatter> "SH\203\354\030H\213\\$8j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb825460 <ompi_coll_libnbc_iscatterv> "SH\203\354\020H\213\\$8j",
MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb814fe0 <ompi_coll_libnbc_allgather_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb815ae0 <ompi_coll_libnbc_allgatherv_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb8179f0 <ompi_coll_libnbc_allreduce_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb818ae0 <ompi_coll_libnbc_alltoall_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb819970 <ompi_coll_libnbc_alltoallv_init> "H\213D$(L\213T$0\307D$0\001",
MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81a6f0 <ompi_coll_libnbc_alltoallw_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81b0b0 <ompi_coll_libnbc_barrier_init> "H\211\326H\211ʹ\001", MetaMetaInfoLen = 15261776,
MetaMetaID = 0x152aeb81be70 <ompi_coll_libnbc_bcast_init> "H\213D$\020L\213L$\b\307D$\020\001", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81c7b0 <ompi_coll_libnbc_exscan_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81d0e0 <ompi_coll_libnbc_gather_init> "H\213D$ L\213T$(\307D$(\001",
MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81da30 <ompi_coll_libnbc_gatherv_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb8210d0 <ompi_coll_libnbc_reduce_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb8226d0 <ompi_coll_libnbc_reduce_scatter_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb823ab0 <ompi_coll_libnbc_reduce_scatter_block_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb824300 <ompi_coll_libnbc_scan_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb824c50 <ompi_coll_libnbc_scatter_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb825540 <ompi_coll_libnbc_scatterv_init> "H\213D$(L\213T$0\307D$0\001",
MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb767070 <mca_coll_basic_neighbor_allgather> "AWAVAUATUSH\203\354xL\213\254$\260", MetaMetaIDLen = 15256416}, {MetaMetaInfo = 0x152aeb7678e0 <mca_coll_basic_neighbor_allgatherv> "AWAVAUATUSH\201", <incomplete sequence \354\210>, MetaMetaInfoLen = 15256416,
MetaMetaID = 0x152aeb768140 <mca_coll_basic_neighbor_alltoall> "AWAVAUATUSH\201", <incomplete sequence \354\230>, MetaMetaIDLen = 15256416}, {MetaMetaInfo = 0x152aeb768c80 <mca_coll_basic_neighbor_alltoallv> "AWAVAUATUSH\201", <incomplete sequence \354\230>, MetaMetaInfoLen = 15256416,
MetaMetaID = 0x152aeb7697b0 <mca_coll_basic_neighbor_alltoallw> "AWAVAUATUSH\203\354xH\213\234$\300", MetaMetaIDLen = 15256416}, {MetaMetaInfo = 0x152aeb81dee0 <ompi_coll_libnbc_ineighbor_allgather> "SH\203\354\020H\213\\$(j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81e3e0 <ompi_coll_libnbc_ineighbor_allgatherv> "SH\203\354\030H\213\\$8j",
MetaMetaIDLen = 15261776}, {MetaMetaInfo = 0x152aeb81e910 <ompi_coll_libnbc_ineighbor_alltoall> "SH\203\354\020H\213\\$(j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81ee70 <ompi_coll_libnbc_ineighbor_alltoallv> "SH\203\354\020H\213\\$8j", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb81f370 <ompi_coll_libnbc_ineighbor_alltoallw> "SH\203\354\020H\213\\$8j", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81df40 <ompi_coll_libnbc_neighbor_allgather_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb81e440 <ompi_coll_libnbc_neighbor_allgatherv_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81e970 <ompi_coll_libnbc_neighbor_alltoall_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152aeb81eed0 <ompi_coll_libnbc_neighbor_alltoallv_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaInfoLen = 15261776, MetaMetaID = 0x152aeb81f3d0 <ompi_coll_libnbc_neighbor_alltoallw_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaIDLen = 15261776}, {
MetaMetaInfo = 0x152b0a182de0 <mca_coll_base_reduce_local> "AWAVAULc\372ATUSH\211\375H\203\354\070I\211\364L\211\303H\211L$(\211T$$I\201\377\377\377\377\177\017\207", <incomplete sequence \360>, MetaMetaInfoLen = 15256416, MetaMetaID = 0xeb5cb0 "\200\204\324\375*\025", MetaMetaIDLen = 673}, {
MetaMetaInfo = 0x152aeb82c200 <ompi_coll_libnbc_module_t_class> ".e\202\353*\025", MetaMetaInfoLen = 23274427777069, MetaMetaID = 0x152aeb80f410 <libnbc_module_enable> "1\300\303ff.\017\037\204", MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0,
MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x152aeb814f20 <ompi_coll_libnbc_iallgather> "SH\203\354\020H\213\\$(j", MetaMetaInfoLen = 23274083932704,
MetaMetaID = 0x152aeb817950 <ompi_coll_libnbc_iallreduce> "SH\203\354\030H\213\\$(j", MetaMetaIDLen = 23274083944992}, {MetaMetaInfo = 0x152aeb8198b0 <ompi_coll_libnbc_ialltoallv> "SH\203\354\020H\213\\$8j", MetaMetaInfoLen = 23274083952176,
MetaMetaID = 0x152aeb81b010 <ompi_coll_libnbc_ibarrier> "S1\311H\211\363H\203\354\020\350\021\374\377\377\205\300u\fH\213;\350\005A\377\377\205\300u\021H\203\304\020[\303ff.\017\037\204", MetaMetaIDLen = 23274083958224}, {MetaMetaInfo = 0x152aeb81c760 <ompi_coll_libnbc_iexscan> "SH\203\354\030H\213\\$(j", MetaMetaInfoLen = 23274083962912,
MetaMetaID = 0x152aeb81d970 <ompi_coll_libnbc_igatherv> "SH\203\354\020H\213\\$8j", MetaMetaIDLen = 23274083979280}, {MetaMetaInfo = 0x152aeb822610 <ompi_coll_libnbc_ireduce_scatter> "SH\203\354\030H\213\\$(j", MetaMetaInfoLen = 23274083990000, MetaMetaID = 0x152aeb8242a0 <ompi_coll_libnbc_iscan> "SH\203\354\030H\213\\$(j",
MetaMetaIDLen = 23274083994512}, {MetaMetaInfo = 0x152aeb825460 <ompi_coll_libnbc_iscatterv> "SH\203\354\020H\213\\$8j", MetaMetaInfoLen = 23274083930080, MetaMetaID = 0x152aeb815ae0 <ompi_coll_libnbc_allgatherv_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaIDLen = 23274083940848}, {
MetaMetaInfo = 0x152aeb818ae0 <ompi_coll_libnbc_alltoall_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaInfoLen = 23274083948912, MetaMetaID = 0x152aeb81a6f0 <ompi_coll_libnbc_alltoallw_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaIDLen = 23274083954864}, {MetaMetaInfo = 0x152aeb81be70 <ompi_coll_libnbc_bcast_init> "H\213D$\020L\213L$\b\307D$\020\001",
MetaMetaInfoLen = 23274083960752, MetaMetaID = 0x152aeb81d0e0 <ompi_coll_libnbc_gather_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaIDLen = 23274083965488}, {MetaMetaInfo = 0x152aeb8210d0 <ompi_coll_libnbc_reduce_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaInfoLen = 23274083985104,
MetaMetaID = 0x152aeb823ab0 <ompi_coll_libnbc_reduce_scatter_block_init> "H\213D$\020L\213T$\030\307D$\030\001", MetaMetaIDLen = 23274083992320}, {MetaMetaInfo = 0x152aeb824c50 <ompi_coll_libnbc_scatter_init> "H\213D$ L\213T$(\307D$(\001", MetaMetaInfoLen = 23274083996992, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 23274083966688}, {MetaMetaInfo = 0x152aeb81e3e0 <ompi_coll_libnbc_ineighbor_allgatherv> "SH\203\354\030H\213\\$8j", MetaMetaInfoLen = 23274083969296, MetaMetaID = 0x152aeb81ee70 <ompi_coll_libnbc_ineighbor_alltoallv> "SH\203\354\020H\213\\$8j", MetaMetaIDLen = 23274083971952}, {
MetaMetaInfo = 0x152aeb81df40 <ompi_coll_libnbc_neighbor_allgather_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaInfoLen = 23274083968064, MetaMetaID = 0x152aeb81e970 <ompi_coll_libnbc_neighbor_alltoall_init> "H\213D$\030L\213T$ \307D$ \001", MetaMetaIDLen = 23274083970768}, {
MetaMetaInfo = 0x152aeb81f3d0 <ompi_coll_libnbc_neighbor_alltoallw_init> "H\213D$(L\213T$0\307D$0\001", MetaMetaInfoLen = 0, MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 23274391374816, MetaMetaID = 0x1 <error: Cannot access memory at address 0x1>, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 0,
MetaMetaID = 0x0, MetaMetaIDLen = 0}, {MetaMetaInfo = 0x0, MetaMetaInfoLen = 1, MetaMetaID = 0x0, MetaMetaIDLen = 1681}, {MetaMetaInfo = 0x152af800f080 <mca_coll_cuda_module_t_class> <incomplete sequence \320>, MetaMetaInfoLen = 23274427777030, MetaMetaID = 0x152af800b520 <mca_coll_cuda_module_enable> "H\213\226P\001", MetaMetaIDLen = 0}, {
MetaMetaInfo = 0x0, MetaMetaInfoLen = 23274293608048, MetaMetaID = 0x0, MetaMetaIDLen = 0}...}, MetaEncodeBuffer = <error reading variable: Cannot access memory at address 0x636f6c6220746e7d>, AttributeEncodeBuffer = <error reading variable: Cannot access memory at address 0x7550206f74206c74>, DataBuffer = 0x40000000656d0074}
#3 0x0000152b073284ed in adios2::core::engine::SstWriter::EndStep (this=0x7fd840) at /p/software/juwelsbooster/stages/2023/software/GCCcore/11.3.0/include/c++/11.3.0/bits/adiosMemory.inl:308
lf_FreeBlocks = {<No data fields>}
MetaMetaBlocks = 0xcf674d9be2ef1e00
i = 5419
TSInfo = 0xb12cb0
newblock = 0x152b18f8de40
iovec = std::vector of length -479110005398699999, capacity -479111793538838735 = {<error reading variable iovec (Cannot access memory at address 0x6a626f5f61746164)>
#4 0x0000152b18f764b4 in adios2_end_step (engine=0x7fd840) at /dev/shm/goebbert1/juwelsbooster/ADIOS2/20230620/foss-2022a-debug/easybuild_obj/bindings/C/adios2_c_internal.inl:563
engineCpp = 0x7fd840
#5 0x0000152b18dc82f8 in sensei::ADIOS2AnalysisAdaptor::WriteTimestep (this=0xe91810, timeStep=20, time=0.20000001788139343, metadata=std::vector of length 1, capacity 1 = {...}, objects=std::vector of length 1, capacity 1 = {...}) at /p/software/juwelsbooster/stages/2023/software/GCCcore/11.3.0/include/c++/11.3.0/regex_scanner.h:522
mark = {
Buffer = "\020\262\357\000\000\000\000\000\065\000\000\000\000\000\000\000\065\000\000\000\000\000\000\000H\030\351\000\000\000\000\000\210\215\341\030+\025\000\000\240 \247\177\377\177\000\000Pח\001\000\000\000\000\026\000\000\000\000\000\000\000\036\000\000\000\000\000\000\000ect_0/\000\000\000\000\000\000\377\177\000\000\001\000\000\000\001\000\000\000\260,\261\000\000\000\000\000v(\t\031+\025\000\000p \247\177\377\177\000\000Wf\334\030+\025\000", Eventname = 0x152b18e1aba8 "ADIOS2AnalysisAdaptor::WriteTimestep"}
ierr = 0
aerr = adios2_error_none
status = adios2_step_status_ok
#6 0x0000152b18dc491a in sensei::ADIOS2AnalysisAdaptor::Execute (this=0xe91810, dataAdaptor=0xe37150, daOut=0x0) at /p/software/juwelsbooster/stages/2023/software/GCCcore/11.3.0/include/c++/11.3.0/regex_scanner.h:238
mark = {
Buffer = "\220z\343\000\000\000\000\000 W\350\000\000\000\000\000@W\350\000\000\000\000\000\024g\323\030+\025\000\000\060!\247\177\377\177\000\000\060!\247\177\377\177\000\000\240!\247\177\377\177\000\000\311g\323\030+\025\000\000\377\377\377\377\377\377\377\377\000A\351\000\000\000\000\000\020\230\357\000\000\000\000\000!\000\000\000\000\000\000\000!\000\000\000\000\000\000\000@\002\353\030+\025\000\000\304\t1\235\016%\331A\000\000\000\000\000\000\000", Eventname = 0x152b18e1a728 "ADIOS2AnalysisAdaptor::Execute"}
step = 20
objects = std::vector of length 1, capacity 1 = {{<svtkSmartPointerBase> = {Object = 0xdd35b0}, <No data fields>}}
metadata = std::vector of length 1, capacity 1 = {std::shared_ptr<sensei::MeshMetadata> (use count 1, weak count 0) = {get() = 0xe89ba0}}
timeStep = 20
time = 0.20000001788139343
--Type <RET> for more, q to quit, c to continue without paging--
#7 0x0000152b19092987 in sensei::ConfigurableAnalysis::Execute (this=0xe37090, data=0xe37150, dataOut=0x7fff7fa72630) at /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/easybuild_obj/sensei/XMLUtils.h:1555
analysisName = 0xe94100 "ADIOS2AnalysisAdaptor::0::Execute"
logEnabled = true
event = {
Buffer = "p\"\247\177\377\177\000\000\311g\323\030+\025\000\000\377\377\377\377\377\377\377\377r\344J\000\000\000\000\000\020\"\247\177\377\177\000\000\017\000\000\000\000\000\000\000bridge::Execute\000\300\t1\235\016%\331A", '\000' <repeats 16 times>, "\377\377\377\377\377\377\377\377\000\000\000\000\377\177\000\000\300&\344\372*\025\000\000\000#\247\177\377\177\000\000@\002\353\030+\025\000", Eventname = 0x152b190bfd75 "ConfigurableAnalysis::Execute"}
ai = 0
iter = {<svtkSmartPointerBase> = {Object = 0xe91810}, <No data fields>}
end = {<svtkSmartPointerBase> = {Object = 0x0}, <No data fields>}
#8 0x000000000048f949 in bridge::execute (step=20, time=0.200000018, dataOut=0x7fff7fa72630) at /dev/shm/goebbert1/juwelsbooster/sensei/20230619/foss-2022a-adios2-20230620-catalyst-5.10.1-debug/SENSEI-8f71e07faa43f792ec473fa20c9cb4b183ad3d47/miniapps/oscillators/svtkSmartPointer.h:70
event = {
Buffer = "\320<\247\177\377\177\000\000\320<\247\177\377\177\000\000o\rF\000\000\000\000\000@\000\000\000\000\000\000\000 #\247\177\377\177\000\000=DE\000\000\000\000\000 \201\345\000\000\000\000\000\060#\247\177\003\000\000\000\320<\247\177\377\177\000\000\320<\247\177\377\177\000\000\376<\247\177\377\177\000\000@\000\000\000\000\000\000\000@#\247\177\377\177\000\000\261\203C", '\000' <repeats 13 times>, "\320<\247\177\377\177\000", Eventname = 0x4ae472 "bridge::Execute"}
#9 0x0000000000442f31 in main (argc=4, argv=0x7fff7fa73e88) at /p/software/juwelsbooster/stages/2023/software/GCCcore/11.3.0/include/c++/11.3.0/bits/array:302
event = {
Buffer = "oscillator.coredump3\000\177\000\000h%\247\177\377\177\000\000\001", '\000' <repeats 11 times>, "+\025\000\000\020\260\001\b+\025\000\000\254`\004\b+\025\000\000\316\333\325\341\000\000\000\000\360\301\003\b+\025\000\000\320\360\003\b+\025\000\000oW\207\003\000\000\000\000\360%\247\177\377\177\000\000\340%\247\177\377\177\000\000\000\232\345\372*\025\000\000?\001\000\000\000\000\000", Eventname = 0x4abbb9 "oscillators::invoke_in_situ"}
daOut = 0x0
mpiMan = {mRank = 6, mSize = 8}
start = {__d = {__r = 1687435891623414153}}
comm = {comm_ = 0x4ee9e0 <ompi_mpi_comm_world>, rank_ = 6, size_ = 8, owner_ = false}
shape = {<std::array<int, 3>> = {_M_elems = {64, 64, 64}}, <No data fields>}
nblocks = 8
t_end = 10
dt = 0.00999999978
velocity_scale = 50
threads = 1
ghostCells = 1
numberOfParticles = 0
seed = 604882601
config_file = "./sensei-transport.xml"
out_prefix = ""
bounds = {<std::array<float, 6>> = {_M_elems = {0, -1, 0, -1, 0, -1}}, <No data fields>}
ops = {args = empty std::__cxx11::list, options = std::__cxx11::list = {[0] = {s = 98 'b', l = "blocks", d = "8", t = "INT", help = "number of blocks to use. must greater or equal to number of MPI ranks."}, [1] = {s = 115 's', l = "shape", d = "64 64 64", t = "UNKNOWN TYPE", help = "global number of cells in the domain"}, [2] = {s = 101 'e', l = "bounds",
d = "0 -1 0 -1 0 -1", t = "UNKNOWN TYPE", help = "global bounds of the domain"}, [3] = {s = 116 't', l = "dt", d = "0.01", t = "FLOAT", help = "time step"}, [4] = {s = 102 'f', l = "config", d = "", t = "STRING", help = "Sensei analysis configuration xml (required)"}, [5] = {s = 69 'E', l = "t-end", d = "10", t = "FLOAT", help = "end time"}, [6] = {
s = 106 'j', l = "jobs", d = "1", t = "INT", help = "number of threads to use"}, [7] = {s = 111 'o', l = "output", d = "", t = "STRING", help = "prefix to save output"}, [8] = {s = 103 'g', l = "ghost-cells", d = "1", t = "INT", help = "number of ghost cells"}, [9] = {s = 112 'p', l = "particles", d = "0", t = "INT",
help = "number of random particles to generate"}, [10] = {s = 118 'v', l = "v-scale", d = "50", t = "FLOAT", help = "scale factor to convert function gradient to velocity"}, [11] = {s = 0 '\000', l = "seed", d = "604882601", t = "INT", help = "specify a random seed"}, [12] = {s = 0 '\000', l = "sync", d = "", t = "",
help = "synchronize after each time step"}, [13] = {s = 0 '\000', l = "verbose", d = "", t = "", help = "print debugging messages"}, [14] = {s = 104 'h', l = "help", d = "", t = "", help = "show help"}}, failed = false}
sync = false
verbose = false
infn = "sample.osc"
particlesPerBlock = 0
rng = {_M_x = 298454978}
oscillators = {mSize = 5, mData = std::shared_ptr<Oscillator> (use count 2, weak count 0) = {get() = 0xe33f20}}
master = {links_ = std::vector of length 1, capacity 1 = {0xa5fc50}, blocks_ = {create_ = 0x4400b9 <Block::create()>, destroy_ = 0x44012b <Block::destroy(void*)>, storage_ = 0x0, save_ = 0x0, load_ = 0x0, elements_ = std::vector of length 1, capacity 1 = {0xe378e0}, external_ = std::vector of length 1, capacity 1 = {-1}, in_memory_ = {x_ = 1,
m_ = {<No data fields>}}}, gids_ = std::vector of length 1, capacity 1 = {6}, lids_ = std::map with 1 element = {[6] = 0}, queue_policy_ = 0xdaad30, limit_ = -1, threads_ = 1, storage_ = 0x0, comm_ = {comm_ = 0xbc7b70, rank_ = 6, size_ = 8, owner_ = true}, incoming_ = std::map with 1 element = {[20] = {map = std::map with 0 elements, received = 7}},
outgoing_ = std::map with 1 element = {[6] = {external = -1, external_local = std::map with 0 elements, queues = std::map with 0 elements}}, inflight_sends_ = std::unique_ptr<sdiy::Master::InFlightSendsList> = {get() = 0xe337d0}, inflight_recvs_ = std::unique_ptr<sdiy::Master::InFlightRecvsMap> = {get() = 0xe33450},
collectives_ = std::unique_ptr<sdiy::Master::CollectivesMap> = {get() = 0xe334d0}, expected_ = 7, exchange_round_ = 20, immediate_ = true, commands_ = std::vector of length 0, capacity 1, add_mutex_ = {<No data fields>}, log = std::shared_ptr<sdiy::spd::logger> (use count 1, weak count 0) = {get() = 0xe338a0}, prof = {<No data fields>}}
assigner = {<sdiy::StaticAssigner> = {<sdiy::Assigner> = {_vptr.Assigner = 0x4eb720 <vtable for sdiy::ContiguousAssigner+16>, size_ = 8, nblocks_ = 8}, <No data fields>}, <No data fields>}
domain = {min = {<std::array<int, 4>> = {_M_elems = {0, 0, 0, 0}}, <No data fields>}, max = {<std::array<int, 4>> = {_M_elems = {63, 63, 63, 0}}, <No data fields>}}
origin = {<std::array<float, 3>> = {_M_elems = {0, 0, 0}}, <No data fields>}
spacing = {<std::array<float, 3>> = {_M_elems = {1, 1, 1}}, <No data fields>}
gids = std::vector of length 1, capacity 1 = {6}
from_x = std::vector of length 1, capacity 1 = {-1}
from_y = std::vector of length 1, capacity 1 = {31}
from_z = std::vector of length 1, capacity 1 = {31}
to_x = std::vector of length 1, capacity 1 = {32}
to_y = std::vector of length 1, capacity 1 = {64}
to_z = std::vector of length 1, capacity 1 = {64}
share_face = std::vector<bool> of length 0, capacity 0
wrap = std::vector<bool> of length 3, capacity 64 = {true, true, true}
ghosts = std::vector of length 3, capacity 3 = {1, 1, 1}
t_count = 20
t = 0.200000018
(gdb)
You can find the whole job-output including the core files here
Well, I thought I could easily recreate what I thought was happening in a simple test and debug the problem. I tried that and so far I've failed. I expect that I'm going to have to build SENSEI and try your examples in order to reproduce, but it might be a few weeks before I'm able to do that. Just FYI...
Thank you that you are looking into this.
Hi @eisenhauer, where you able to reproduce the problem with SENSEI?
Sorry, got as far as downloading SENSEI last week and then got distracted by a critical demo (and the need to wipe and reinstall my laptop because of an ongoing problem). This is on my list for this week, possibly later today.
Cool, I keep my fingers crossed :)
OK, I've spent enough time on this that I've got it running, but I'm not able to reproduce the problem. Some things to note: I built with SENSEI github master and ADIOS2 github master (which is close enough to 2.9.0 that it shouldn't matter). The first thing I found is that SENSEI had compilation failures with ADIOS 2.9.0 because of the changes in the ADIOS API (elimination of DebugMode in adios2_init()). I edited sensei/ADIOS2AnalysisAdaptor.cxx and sensei/ADIOS2Schema.cxx to eliminate the debug mode parameter and things compiled fine. I skipped the slurm script but instead ran the two clients using MPI on my laptop. I get no segfaults, but I do see some weird behaviour, some of which I can trace to the sensei-transport.xml file. For example RendezvousReaderCount=0 means that the oscillator can and will produce data that is dropped on the floor until the sensei process shows up. Then the QueueFullPolicy=discard also means that even after connected if the producer is producing data faster than they can be sent or consumed, that data will be discarded. (None of these mean that the code where you seemed to be seeing the segfault wouldn't be executed, it would. It's just that the data and metadata block it produced would be discarded.)
I guess the upshot is that I'm at a dead-end. I've tried to reproduce the issue both with and without Sensei without having any luck. I'm wondering a bit about what version of Sensei you might be using since I had to do source-level tweaks just to get it to compile with post-2.9.0 ADIOS. I've seen some anomalies, but nothing that should result in the symptoms that you are seeing. Not quite sure where to go from here...
Hi Greg, this is very surprising. I will go through this again based on you information and come back to you in the next days.
Thank you for looking into this!
Sorry, it takes longer than expected to get the time to go on. But I am on it ...
For now I was running the production runs with MarshalMethod = BP
.