amuse icon indicating copy to clipboard operation
amuse copied to clipboard

Brutus worker dies

Open GFTwrt opened this issue 5 years ago • 28 comments

Hello,

I get the following message: amuse.support.exceptions.CodeException: Exception when calling function 'evolve_model', of code 'BrutusInterface', exception was 'Error in code: no error message - code probably died, sorry.'

Condition:

  • Debian 10

  • git commit 13ca566e32ba2fdd78bae0b33f188b6cf250d52b Author: Inti Pelupessy [email protected] Date: Mon Jan 13 18:25:44 2020 +0100

  • small model with 10 bodies

  • usage of gravity.set_bs_tolerance_string("1e-20") or even gravity.set_bs_tolerance(1e-20) With Bs-tolerance 1e-19 I can vary word-lenght from 112 to 130 and even eta in a wide range without crash. Once I use bs_tolerance 1e-20 the worker dies

Thank you fo r the support.

GFTwrt avatar Feb 03 '20 20:02 GFTwrt

@tjardaboekholt could you have a look at this?

rieder avatar Feb 04 '20 15:02 rieder

Hi, thanks for the error message concerning Brutus.

I managed to do some runs with e=1e-20, and say, Lw=128 and dt_param=0.10, and the code did not crash. If I reduce Lw to 40 bits, then it crashes and gives the same error message you quoted. This is because the number of bits was too low to reach convergence of e=1e-20. If Brutus fails to reach a converged solution within the maximum number of iterations, it will give up and this causes the code to stop. However, for suitable combinations of (e, Lw, dt_param), the code should in principle work fine, i.e. make sure you have enough bits to resolve e.

Cheers!

tjardaboekholt avatar Feb 04 '20 15:02 tjardaboekholt

Hello,

I identified the values you've written as: gravity.set_bs_tolerance_string("1e-20") # as your "e" gravity.set_word_length(130) #as your Lw gravity.set_eta(0.01) #as your dt_param Are this the wrong parameters? It is not working.

GFTwrt avatar Feb 04 '20 17:02 GFTwrt

Hello,

here the build.log. Maybe there is missing something see warnings Building code: brutus, target: all, in directory: src/amuse/community/brutus


make[1]: Verzeichnis „/home/pi/amuse/src/amuse/community/brutus“ wird betreten mpicxx -g -O2 -fPIC -std=c++0x -I../mpfrc++ -I/home/pi/amuse/lib/stopcond -Impfrc++ -I./src -c -o interface.o interface.cc In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h: In function ‘const mpfr::mpreal mpfr::root(const mpfr::mpreal&, long unsigned int, mpfr_rnd_t)’: mpfrc++/mpreal.h:2201:50: warning: ‘int mpfr_root(mpfr_ptr, mpfr_srcptr, long unsigned int, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_root(y.mpfr_ptr(), x.mpfr_srcptr(), k, r); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:693:21: note: declared here __MPFR_DECLSPEC int mpfr_root (mpfr_ptr, mpfr_srcptr, unsigned long, ^~~~~~~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h:2201:50: warning: ‘int mpfr_root(mpfr_ptr, mpfr_srcptr, long unsigned int, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_root(y.mpfr_ptr(), x.mpfr_srcptr(), k, r); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:693:21: note: declared here __MPFR_DECLSPEC int mpfr_root (mpfr_ptr, mpfr_srcptr, unsigned long, ^~~~~~~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h: In function ‘const mpfr::mpreal mpfr::grandom(__gmp_randstate_struct (&)[1], mpfr_rnd_t)’: mpfrc++/mpreal.h:2646:53: warning: ‘int mpfr_grandom(mpfr_ptr, mpfr_ptr, __gmp_randstate_struct*, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_grandom(x.mpfr_ptr(), NULL, state, rnd_mode); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:502:21: note: declared here __MPFR_DECLSPEC int mpfr_grandom (mpfr_ptr, mpfr_ptr, gmp_randstate_t, ^~~~~~~~~~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h:2646:53: warning: ‘int mpfr_grandom(mpfr_ptr, mpfr_ptr, __gmp_randstate_struct*, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_grandom(x.mpfr_ptr(), NULL, state, rnd_mode); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:502:21: note: declared here __MPFR_DECLSPEC int mpfr_grandom (mpfr_ptr, mpfr_ptr, gmp_randstate_t, ^~~~~~~~~~~~ mpicxx -g -O2 -fPIC -std=c++0x -I../mpfrc++ -I/home/pi/amuse/lib/stopcond -I./src worker_code.cc src/libbrutus.a interface.o -o brutus_worker -L./src -lbrutus -L/home/pi/amuse/lib/stopcond -lstopcond -lmpfr -lgmp -lgmp make[1]: Verzeichnis „/home/pi/amuse/src/amuse/community/brutus“ wird verlassen

GFTwrt avatar Feb 04 '20 21:02 GFTwrt

Hello http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Deprecated-Features.html tells that you use functionality which is no longer supported. Can you update the code to state of the art syntax.

GFTwrt avatar Feb 04 '20 22:02 GFTwrt

Hello,

I identified the values you've written as: gravity.set_bs_tolerance_string("1e-20") # as your "e" gravity.set_word_length(130) #as your Lw gravity.set_eta(0.01) #as your dt_param Are this the wrong parameters? It is not working.

Yes that is correct. Another way to set the parameters is:

    code = Brutus()
    code.parameters.bs_tolerance = "1e-20"
    code.parameters.word_length = 128
    code.parameters.dt_param = 0.10
    print(code.parameters) # to check values are correctly set

tjardaboekholt avatar Feb 05 '20 11:02 tjardaboekholt

Just to add to that: the latter way @tjardaboekholt mentioned is the preferred method.

rieder avatar Feb 05 '20 11:02 rieder

Hello,

thank you for showing me the preferred usage. But this has no effect. The problem is the "deprecated" in the build.log. Please have a look to the result according your preferred method: begin_time: 0.0 s default: 0.0 s brutus_output_directory: /home/tst/amuse/data/brutus/output/ default: ./ bs_tolerance: 1e-20 default: 1e-08 dt_param: 0.1 default: 0.24 stopping_condition_maximum_density: 2.55293255306e+306 m**-3 * kg default: -0.0142011587158 m**-3 * kg stopping_condition_maximum_internal_energy: inf m2 * s-2 default: -2558461176.91 m2 * s-2 stopping_condition_minimum_density: -0.0142011587158 m**-3 * kg default: -0.0142011587158 m**-3 * kg stopping_condition_minimum_internal_energy: -2558461176.91 m2 * s-2 default: -2558461176.91 m2 * s-2 stopping_conditions_number_of_steps: 1 default: 1.0 stopping_conditions_out_of_box_size: 0.0 m default: 0.0 m stopping_conditions_out_of_box_use_center_of_mass: 0 default: False stopping_conditions_timeout: 4.0 s default: 4.0 s timestep: 102715.479587 s default: 719008.357111 s word_length: 128 default: 72

0.0 s /home/pi/amuse/src/amuse/units/generic_unit_converter.py:189: RuntimeWarning: overflow encountered in double_scalars return new_quantity(number * factor, new_unit) Traceback (most recent call last):

File "", line 1, in runfile('/home/pi/tests/solar1.py', wdir='/home/pi/tests')

File "/usr/lib/python3/dist-packages/spyder_kernels/customize/spydercustomize.py", line 678, in runfile execfile(filename, namespace)

File "/usr/lib/python3/dist-packages/spyder_kernels/customize/spydercustomize.py", line 106, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/pi/tests/solar1.py", line 124, in gravity_minimal(t_end)

File "/home/pi/tests/solar1.py", line 103, in gravity_minimal gravity.evolve_model(gravity.model_time + (10| units.day))

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 266, in call return self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/rfi/core.py", line 123, in call raise exceptions.CodeException("Exception when calling function '{0}', of code '{1}', exception was '{2}'".format(self.specification.name, type(self.interface).name, ex))

CodeException: Exception when calling function 'evolve_model', of code 'BrutusInterface', exception was 'lost connection to code'

GFTwrt avatar Feb 05 '20 21:02 GFTwrt

As contrast here a good example:

begin_time: 0.0 s default: 0.0 s brutus_output_directory: /home/pi/amuse/data/brutus/output/ default: ./ bs_tolerance: 1e-19 default: 1e-08 dt_param: 0.01 default: 0.24 stopping_condition_maximum_density: 2.55293255306e+306 m**-3 * kg default: -0.0142011587158 m**-3 * kg stopping_condition_maximum_internal_energy: inf m2 * s-2 default: -2558461176.91 m2 * s-2 stopping_condition_minimum_density: -0.0142011587158 m**-3 * kg default: -0.0142011587158 m**-3 * kg stopping_condition_minimum_internal_energy: -2558461176.91 m2 * s-2 default: -2558461176.91 m2 * s-2 stopping_conditions_number_of_steps: 1 default: 1.0 stopping_conditions_out_of_box_size: 0.0 m default: 0.0 m stopping_conditions_out_of_box_use_center_of_mass: 0 default: False stopping_conditions_timeout: 4.0 s default: 4.0 s timestep: 10271.5479587 s default: 719008.357111 s word_length: 128 default: 72

0.0 s /home/pi/amuse/src/amuse/units/generic_unit_converter.py:189: RuntimeWarning: overflow encountered in double_scalars return new_quantity(number * factor, new_unit) 864000.0 s

GFTwrt avatar Feb 05 '20 21:02 GFTwrt

Hello can you confirm that the issue is related to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91226 https://github.com/BrianGladman/mpfr/blob/master/tests/tget_set_d64.c /* The volatile below avoids _Decimal64 constant propagation, which is buggy for non-canonical encoding in various GCC versions on the x86 and x86_64 targets: failure with gcc (Debian 20190719-1) 10.0.0 20190718 (experimental) [trunk revision 273586]; the MPFR test was not failing with previous GCC versions, but GCC versions 5 to 9 are also affected on the simple testcase at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91226 */

GFTwrt avatar Feb 08 '20 22:02 GFTwrt

I have updated mpfr c++ to 3.6.6 - this should get rid of the deprecation warnings..can you try out? (its only updated in the github master, no release on pypi yet)

if you still get the error, can you post a minimal example wich triggers it?

ipelupessy avatar Feb 09 '20 13:02 ipelupessy

the above comment was for @GFTwrt ;-)

ipelupessy avatar Feb 09 '20 13:02 ipelupessy

btw, thanks for bringing this up (I had not noticed mpfr c++ was updated, the website still has the 2015 as the latest )

ipelupessy avatar Feb 09 '20 13:02 ipelupessy

Hello @ipelupessy,

Thank you for your action. It was not the sollution :-( . Please have a look to my last comment (gcc-Bug).

Attached the simple model file "solar1" (constellation of planets from Amuse-book) and the build log. 1e-19 runs 1e-20 fails.

Building code: brutus, target: all, in directory: src/amuse/community/brutus


make[1]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus“ wird betreten /home/tom/testam/build.py --type=c interface.py BrutusInterface -o worker_code.cc /home/tom/testam/build.py --type=H -i amuse.support.codes.stopping_conditions.StoppingConditionInterface interface.py BrutusInterface -o worker_code.h make -C src all CXXFLAGS="-g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++" make[2]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus/src“ wird betreten g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Star.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Cluster.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Bulirsch_Stoer.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Brutus.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c main.cpp rm -f libbrutus.a ar crs libbrutus.a main.o Brutus.o Bulirsch_Stoer.o Cluster.o Star.o ranlib libbrutus.a make[2]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus/src“ wird verlassen mpicxx -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -I./mpfrc++ -I/home/tom/testam/lib/stopcond -Impfrc++ -I./src -c -o interface.o interface.cc mpicxx -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -I./mpfrc++ -I/home/tom/testam/lib/stopcond -I./src worker_code.cc src/libbrutus.a interface.o -o brutus_worker -L./src -lbrutus -L/home/tom/testam/lib/stopcond -lstopcond -L/usr/lib/x86_64-linux-gnu/ -lmpfr -L/usr/lib/x86_64-linux-gnu/ -lgmp
make[1]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus“ wird verlassen

solar1.txt

GFTwrt avatar Feb 10 '20 19:02 GFTwrt

there is an error in the state model for Brutus. The script will work if the parameters are set before the particles are added; I think if you do it the other way round ~~the changes in the word_length are not propagated to the integrator~~ the derived eta is not updated anymore...hence the failure to converge! So the script can be made to work by:

    ...
    gravity = Brutus(convert_nbody,number_of_workers=1)

    gravity.parameters.bs_tolerance = 1e-20
    gravity.parameters.word_length = 128
    gravity.parameters.dt_param = 0.010
    print(gravity.parameters) # to check values are correctly set
    gravity.particles.add_particles(bodies)
    ...

but I will try to fix the state model, because the ordering should not matter...

ipelupessy avatar Feb 11 '20 10:02 ipelupessy

hmm my explanation above was not entirely correct..

ipelupessy avatar Feb 11 '20 10:02 ipelupessy

@tjardaboekholt I think the problem is in the set_eta(tolerance) ..it is called in the setup which is called in the commit_particles...according to the state model of gravitational_dynamics commit_particles is triggered also when changing the parameters after adding particles. We could fix this by moving the setup or add an set_eta to the setter of the tolerance??

ipelupessy avatar Feb 11 '20 11:02 ipelupessy

I confirm that by setting the parameters before giving the particles to Brutus makes the script run. So please proceed using this temporary fix. Also, the current version of Brutus adapts eta to the value of epsilon that is given. In principle this should be ok as then you can just focus on 2 parameters (epsilon, word-length). Meanwhile I plan to update the Brutus version in Amuse soon, together with a fix for this issue. Many thanks for pointing this out to us.

tjardaboekholt avatar Feb 11 '20 11:02 tjardaboekholt

Thank you for your support. The model is evolving now. Lets have a look to the result.

GFTwrt avatar Feb 11 '20 19:02 GFTwrt

@tjardaboekholt: May I add a request - if you do some updates in Brutus? As I mentioned at the first post to amuse at github I want to do simulation of our solar system including solar wind. I expect to need a resolution in energy conservation better than 1 mW (milli Watt). At the moment the interface between the code and python is not able to transport this accuracy. Can you add a string based interface providing a number (difference of energy between two freely chosen timesteps by the time difference) as well as the particle data? It would be very nice to get such an interface.

GFTwrt avatar Feb 11 '20 20:02 GFTwrt

dear GFT,

in principle, you can do that already by converting your mW (which is basically an enery conserving quantity, to a tolerance. the tolerance then is the inverse of the fraction of the total binding energy of the Solar system in terms of 1mW. sounds like you are performing an interesting experiment.

Simon

On Tue, Feb 11, 2020, 21:49 GFTwrt [email protected] wrote:

@tjardaboekholt https://github.com/tjardaboekholt: May I add a request

  • if you do some updates in Brutus? As I mentioned at the first post to amuse at github I want to do simulation of our solar system including solar wind. I expect to need a resolution in energy conservation better than 1 mW (milli Watt). At the moment the interface between the code and python is not able to transport this accuracy. Can you add a string based interface providing a number (difference of energy between two freely chosen timesteps by the time difference) as well as the particle data? It would be very nice to get such an interface.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amusecode/amuse/issues/579?email_source=notifications&email_token=ABCPFTEG6L3RAETPFUN3M73RCMFN5A5CNFSM4KPL5JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELOAI5Y#issuecomment-584844407, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCPFTDTZY4GMR3NJBEIJDDRCMFN5ANCNFSM4KPL5JBA .

spzwart avatar Feb 11 '20 21:02 spzwart

@spzwart: Thank You Simon. I was not sure how to interpret eta (tolerance/bs_tolerance) out of your paper. I was not sure about potential or energy. So the unit of tolerance (eta) is (1/(energy/power)) and therefore time?

GFTwrt avatar Feb 11 '20 22:02 GFTwrt

@tjardaboekholt: if you add the get__string versions (and maybe setters) of the particle attributes that would be a good start (the issue #155 suggests adding some code to automatically get e.g. gmpy attributes )

ipelupessy avatar Feb 12 '20 08:02 ipelupessy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 04 '22 17:03 stale[bot]

The original issue seems solved, but I am not sure if the later brutus fixes proposed have been implemented; @tjardaboekholt there is mention of a new brutus version: has that been merged? Also note the full string interface functions should be checked?

ipelupessy avatar Mar 12 '22 11:03 ipelupessy

Hi Inti, thanks for the reminder. The student Arend Moerman has implemented PN terms into Brutus. I will also check his Amuse interface/string treatment. I will work on merging this into Amuse as soon as I have some time!

tjardaboekholt avatar Mar 16 '22 22:03 tjardaboekholt

Hello, mayby you should have a look to https://github.com/GFTwrt/amuse/tree/master/src/amuse/community/gpuhermite8 too. Thomas Ps. The interface is the samethan https://github.com/GFTwrt/amuse/tree/master/src/amuse/community/brutus

GFTwrt avatar Mar 17 '22 22:03 GFTwrt

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 17 '22 00:05 stale[bot]