dash
dash copied to clipboard
dash::copy not working between containers in different teams
dash::copy (both, in global-to-global and in global-to-local mode) segfaults when one wants to copy between containers that have different teams associated to them.
The example where you can check this can be found in dash-apps --> multigrid/multigrid3d_elastic.cpp. This currently still needs the feat-halo branch.
... we talked about this at the project meeting last week. If you need more details, I'll be happy to bring them.
Thanks, Andreas
Andreas,
Thanks for opening a ticket, that helps tracking the issue. It's still not clear what is going wrong here... Before starting to debug this, do you happen to have a stack trace at hand?
==== backtrace ====
2 0x00000000000575cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.8.0-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641
3 0x000000000005773c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.8.0-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616
4 0x0000003afca32510 killpg() ??:0
5 0x0000003afca89782 memcpy() ??:0
6 0x000000000041262f _ZN4dash4copyIdNS_8GlobIterIdNS_12BlockPatternILi3ELNS_10MemArrangeE1ElEENS_13GlobStaticMemIdNS_9allocator18SymmetricAllocatorIdEEEENS_7GlobPtrIdS9_EENS_7GlobRefIdEEEEEEPT_T0_SH_SG_() /sw/taurus/libraries/dash/dash-feat-halo_14-09-2017/include/dash/algorithm/Copy.h:878
7 0x000000000040af83 _Z15transfertofewerR5LevelS0_() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:611
8 0x000000000040c358 _Z7v_cycleN9__gnu_cxx17__normal_iteratorIPKP5LevelSt6vectorIS2_SaIS2_EEEES8_jd() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:852
9 0x000000000040c79b _Z7v_cycleN9__gnu_cxx17__normal_iteratorIPKP5LevelSt6vectorIS2_SaIS2_EEEES8_jd() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:903
10 0x000000000040c79b _Z7v_cycleN9__gnu_cxx17__normal_iteratorIPKP5LevelSt6vectorIS2_SaIS2_EEEES8_jd() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:903
11 0x000000000040c79b _Z7v_cycleN9__gnu_cxx17__normal_iteratorIPKP5LevelSt6vectorIS2_SaIS2_EEEES8_jd() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:903
12 0x000000000040da6b main() /home/knuepfe/prog/dash-apps/multigrid/multigrid3d_elastic.cpp:1158
13 0x0000003afca1ed1d __libc_start_main() ??:0
14 0x0000000000407771 _start() ??:0
===================
This appears to be a bug somewhere in the pattern code. Here is what I have so far:
dash::copy
first assumes that the copy is all local because the range returned by dash::local_index_range(in_first, in_last)
has the length of the total_copy_elem
. However, the call to in_first.local()
returns nullptr
because _pattern->local(idx)
claims that the values are located on another unit.
I'm afraid that unless I'm spending significant amount of time paging through the pattern code I won't be of much help. I think this is a job for @fuchsto
@devreal Aye!