flecsi icon indicating copy to clipboard operation
flecsi copied to clipboard

HPX Backend

Open hkaiser opened this issue 4 years ago • 92 comments

This includes:

  • ~Rename interface to interface_ (avoid conflict with some platforms that have interface defined as a macro)~ added #undef interface instead
  • Fixing various errors and warnings
  • adding build system support for HPX
  • adding missing gettimeofday for windows
  • replace usleep/sleep with std::this_thread_sleep_for()
  • workaround for MSVC early template instantiation problem (annotation.hh)

hkaiser avatar May 25 '21 22:05 hkaiser

@ipdemes: I just opened this PR for you to follow the progress. It will take more work, however. I'll remove the WIP tag as soon as it's ready to be reviewed.

hkaiser avatar May 25 '21 22:05 hkaiser

I don't like renaming interface to an ugly_ name, since (not counting the one unit test that happens to use it) it is in fact exposed to specialization authors (albeit not to application developers). I know we've been down a very similar road with max/min, but should we just #undef it on whatever broken platform (with the caveat that users there would have to make sure not to #include anything after FleCSI headers that might depend on it)? If not, I think we ought to (grudgingly) pick a different word (maybe just access like the related class template in the core topology).

opensdh avatar May 27 '21 17:05 opensdh

Hartmut,

Thank you!

Irina


From: Hartmut Kaiser @.***> Sent: Tuesday, May 25, 2021 4:26:07 PM To: flecsi/flecsi Cc: Demeshko, Irina P; Mention Subject: [EXTERNAL] Re: [flecsi/flecsi] [WIP] HPX Backend (#1)

@ipdemeshttps://github.com/ipdemes: I just opened this PR for you to follow the progress. It will take more work, however. I'll remove the WIP tag as soon as it's ready to be reviewed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/flecsi/flecsi/pull/1#issuecomment-848309749, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4ZQSCRKVDXDQT4S3PNCGDTPQPX7ANCNFSM45QKSZBQ.

ipdemes avatar May 27 '21 18:05 ipdemes

I don't like renaming interface to an ugly_ name, since it is in fact exposed to specialization authors (albeit not to application developers). I know we've been down a very similar road with max/min, but should we just #undef it on whatever broken platform (with the caveat that users there would have to make sure not to #include anything after FleCSI headers that might depend on it)? If not, I think we ought to (grudgingly) pick a different word (maybe just access like the related class template in the core topology).

I'll go with whatever you decide. FWIW, I don't like the ugly_ myself.

hkaiser avatar May 27 '21 18:05 hkaiser

I'll remove the WIP tag as soon as it's ready to be reviewed.

I couldn't help myself, as you can see, but I tried to limit my comments to code that looked "done".

opensdh avatar May 27 '21 18:05 opensdh

@ipdemes I have removed the WIP tag from this as I think the PR has reached a minimal viable state. The tests from the run and exec submodules compile and run now.

hkaiser avatar Jun 11 '21 19:06 hkaiser

@hkaiser : thank you!

ipdemes avatar Jun 11 '21 19:06 ipdemes

@ipdemes I have updated the PR with recent changes from https://github.com/flecsi/flecsi/tree/2. All tests pass for me now. Please let me know if there is anything else you would like me to change.

hkaiser avatar Sep 08 '21 16:09 hkaiser

@ipdemes I have updated the PR with recent changes from https://github.com/flecsi/flecsi/tree/2. All tests pass for me now. Please let me know if there is anything else you would like me to change.

The one thing I changed was to add -DENABLE_MPI_THREAD_MULTIPLE=On to my flecsi build.

hkaiser avatar Sep 08 '21 17:09 hkaiser

@hkaiser thank you! @ollielo can you, please, pull and try to build it ?

ipdemes avatar Sep 08 '21 18:09 ipdemes

@hkaiser

Do you mean when building HPX or building FleCSI?

ollielo avatar Sep 15 '21 05:09 ollielo

@hkaiser

Do you mean when building HPX or building FleCSI?

I think it should work without any additional settings. For both, HPX and FleCSI the corresponding options should be enabled by default. Nevertheless I did specify the MPI multi-threaded options:

  • HPX: -DCMAKE_BUILD_TYPE=<Debug/Release> -DCMAKE_INSTALL_PREFIX=<...> -DHPX_WITH_PARCELPORT_MPI=ON -DHPX_WITH_PARCELPORT_MPI_MULTITHREADED=ON -DHPX_WITH_FETCH_ASIO=On -DHPX_WITH_EXAMPLES=Off -DHPX_WITH_TESTS=Off -DHPX_WITH_CXX17=On
  • FleCSI: -DCMAKE_BUILD_TYPE=<Debug/Release> -DCMAKE_INSTALL_PREFIX=<...> -DFLECSI_RUNTIME_MODEL=hpx -DENABLE_MPI_CXX_BINDINGS=On -DENABLE_MPI_THREAD_MULITPLE=On -DENABLE_FLOG=On -DCALIPER_DETAIL=high -Dcaliper_DIR=<...> -DHPX_DIR=<...>

hkaiser avatar Sep 15 '21 08:09 hkaiser

@hkaiser

FleCSI it self does not use the ENABLE_MPI_THREAD_MULTIPLE macro. Does it impact some of the HPX header files?

ollielo avatar Sep 15 '21 13:09 ollielo

@hkaiser

FleCSI it self does not use the ENABLE_MPI_THREAD_MULTIPLE macro. Does it impact some of the HPX header files?

No, the HPX backend uses multi-threaded MPI in FleCSI by default. Seems that option can be removed, then.

hkaiser avatar Sep 15 '21 13:09 hkaiser

Seems that option can be removed, then.

I believe @tuxfan is concurrently removing CMake support for that macro.

opensdh avatar Sep 15 '21 13:09 opensdh

A lot of documentation work has just been merged into 2. There are now release notes wherein to advertise the HPX backend, and there may be some good places to add Doxygen (although the backends need much less of that than does the front-end code).

opensdh avatar Dec 08 '21 17:12 opensdh

We do need release notes and an update to options.rst even if no other documentation is required because it works just like the other backends.

I updated options.rst (added mentioning of HPX). Where do release notes go?

hkaiser avatar Jan 08 '22 18:01 hkaiser

I updated options.rst (added mentioning of HPX). Where do release notes go?

../news.rst (relative to options.rst). I realize that my creation of the "Changes in v2.2" section hasn't been merged yet; I can do that quickly if you like, or we can come back to this at the end.

opensdh avatar Jan 10 '22 16:01 opensdh

Shouldn't we have a Spack variant for HPX?

opensdh avatar Apr 20 '22 05:04 opensdh

Shouldn't we have a Spack variant for HPX?

We do have HPX support in Spack.

hkaiser avatar Apr 20 '22 12:04 hkaiser

We do have HPX support in Spack.

Right, sorry. I've been reading lots of documentation lately and not much code, and we don't document the backend=hpx variant. I confused that with the actual code, but we do need to make minor updates to the documentation:

  • [x] the very top of doc/sphinx/src/build.rst
  • [x] "Parallelization options" in doc/sphinx/src/build/options.rst
  • [x] doc/sphinx/src/news.rst (finally!), in the structural comments at the top and as a new v2.2.0 feature
  • [x] ideally, in doc/sphinx/src/summary.rst somewhere near "The MPI backend uses MPI to transfer data" and "The MPI backend supports only"

opensdh avatar Apr 20 '22 14:04 opensdh

@opensdh this PR is now ready for final review. Thanks!

hkaiser avatar Sep 08 '22 15:09 hkaiser

@hkaiser

Current version breaks the MPI backend. Did you make the needed change to it?

ollielo avatar Sep 08 '22 19:09 ollielo

@hkaiser

Current version breaks the MPI backend. Did you make the needed change to it?

Will check, thanks!

hkaiser avatar Sep 08 '22 19:09 hkaiser

@hkaiser Current version breaks the MPI backend. Did you make the needed change to it?

Will check, thanks!

That should be fine now. Sorry for this!

hkaiser avatar Sep 08 '22 21:09 hkaiser

@opensdh I have addressed the comments and squashed the commits into one. The commit message now reflects (most of) the changes made overall.

hkaiser avatar Sep 09 '22 00:09 hkaiser

@opensdh I have addressed the comments and squashed the commits into one. The commit message now reflects (most of) the changes made overall.

So you're ready to "release the lock" and let us test it internally (and fix documentation, etc.)?

opensdh avatar Sep 09 '22 03:09 opensdh

@hkaiser

It still breaks the mpi backend. First there is one line change need to compile

[ollie@pn2200824 cmake-build-debug]$ git diff
diff --git a/flecsi/io/mpi/policy.hh b/flecsi/io/mpi/policy.hh
index 62694c82a..f0fa067c5 100644
--- a/flecsi/io/mpi/policy.hh
+++ b/flecsi/io/mpi/policy.hh
@@ -156,7 +156,8 @@ struct io_interface {
         int item_size = fp->type_size;
         std::string field_name = region_name + " field " + std::to_string(fid);
 
-        const auto data = isd.partition->get_raw_storage(fid, item_size);
+        // FIXME: a shorter/better way to call get_raw_storage?
+        const auto data = (*isd.partition)->get_raw_storage(fid, item_size);
         hsize_t size = data.size() / item_size;
         void * buffer = data.data();
         checkpoint_field<W>(

It also seg fault when running the index unit test.

7: AddressSanitizer:DEADLYSIGNAL
7: =================================================================
7: ==203174==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000800d84 bp 0x7ffd97c607d0 sp 0x7ffd97c606a0 T0)
7: ==203173==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000800d84 bp 0x7ffe550d7630 sp 0x7ffe550d7500 T0)
7: ==203173==The signal is caused by a READ memory access.
7: ==203174==The signal is caused by a READ memory access.
7: ==203174==Hint: address points to the zero page.
7: ==203173==Hint: address points to the zero page.
7:     #0 0x800d84 in auto flecsi::data::local::partition_impl::get_storage<Noisy, (flecsi::exec::task_processor_type_t)0>(unsigned long) const /home/ollie/CLionProjects/flecsi/flecsi/data/local/policy.hh:127:12
7:     #1 0x8060c5 in void flecsi::exec::task_prologue<(flecsi::exec::task_processor_type_t)0>::visit<Noisy, 7u, flecsi::topo::index, (flecsi::topo::single_space)0>(flecsi::data::accessor<(flecsi::data::layout)0, Noisy, 7u>&, flecsi::data::field_reference<Noisy, (flecsi::data::layout)0, flecsi::topo::index, (flecsi::topo::single_space)0> const&) /home/ollie/CLionProjects/flecsi/flecsi/exec/mpi/task_prologue.hh:95:17
7:     #2 0x805e98 in auto auto flecsi::exec::prolog<(flecsi::exec::task_processor_type_t)0>::visitor<flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const>(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&)::'lambda'(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&, auto&&)::operator()<flecsi::data::accessor<(flecsi::data::layout)0, Noisy, 7u>, void flecsi::data::accessor<(flecsi::data::layout)2, Noisy, 7u>::send<'lambda'(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&, auto&&)>(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&&)::'lambda'(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const const&)>(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&, auto&&) const /home/ollie/CLionProjects/flecsi/flecsi/exec/prolog.hh:64:34
7:     #3 0x805aa6 in void flecsi::data::accessor<(flecsi::data::layout)2, Noisy, 7u>::send<auto flecsi::exec::prolog<(flecsi::exec::task_processor_type_t)0>::visitor<flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const>(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&)::'lambda'(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&, auto&&)>(flecsi::data::field_reference<Noisy, (flecsi::data::layout)2, flecsi::topo::index, (flecsi::topo::single_space)0> const&&) /home/ollie/CLionProjects/flecsi/flecsi/data/accessor.hh:241:5
...

Is the r in partition_impl properly inited?

ollielo avatar Sep 09 '22 17:09 ollielo

@ollielo Thanks for testing. I have not run the tests with the MPI backend yet - I will take a look.

hkaiser avatar Sep 09 '22 17:09 hkaiser

It actually crashes at the destructor/finializer in the cleanup(). Hope this helps.

ollielo avatar Sep 12 '22 18:09 ollielo