picongpu icon indicating copy to clipboard operation
picongpu copied to clipboard

Config validation still rudimentary

Open n01r opened this issue 6 years ago • 17 comments

One of the command line options of picongpu --validate can in principle validate all the command line options that are specified in the .cfg files.

This validation is still pretty rudimentary as it cannot so far check:

  • if <species>_calorimeter.period and <species>_calorimeter.file appear the same number of times
  • if the full box size of the sim is valid with the supercell size that has been compiled in
  • if all tbg variables that have been defined are actually used (which @psychocoderHPC says is hard since picongpu doesn't know anything about tbg and the variables also do not need to be prefixed with TBG_)

So, are there ways to make this better so that as many mistakes as possible can be caught before the actual run of the job

n01r avatar Sep 26 '18 08:09 n01r

Not saying the following approach is any good and ofc it is not a replacement for proper validation, but might ease the pain in some cases. So not suggesting to necessarily do that, just sharing my thoughts.

Imagine one has a large simulation to do, but for some reason it is 'risky' in a way it may realistically fail. For example, it uses a new plugin, which is not well-tested yet, or it is just a new type of simulation and smth may be not adjusted well. In case it crashes at the very first iterations, not only no valuable information is obtained, but also user's and a lot of computational time is wasted, also it might take a lot of time even being in the queue. Now to eliminate this early-iteration crashes due to wrong parameters, one could try to first run the same simulation e.g., for 10 or 100 time steps, which takes much less computational time, and probably time in the queue as well. And then in case that went well to enqueue the proper large simulation. This works kinda like insurance: in case everything was actually fine, one just did one small unnecessary run and wasted a little bit of his and computational time; but in case of an early crash, the losses are cut by a lot. Of course, this could be automated with a 'validated run' script, which does this routine of queue up small run, check, if necessary queue up large run.

sbastrakov avatar Sep 26 '18 08:09 sbastrakov

@sbastrakov What you are suggesting is basically what I am doing most of the times. Either I try to go 2D and with full size to really get information on how the sim is evolving - or I go for a single/few-GPU 3D case where my boundary effects will basically destroy any physical results from the sim but then this is just a test if all of the plugins actually work.

I also tried the --validate option and got fooled once, since it didn't catch the critical parts. So maybe we can make this validation more profound. =)

n01r avatar Sep 26 '18 09:09 n01r

Just as a side node, imho I would not invest much more in the command line options. As noted in https://github.com/ComputationalRadiationPhysics/picongpu/issues/2421#issuecomment-349956245, we should implement runtime input files for all but the most rudimentary options.

(performance-wise, only rank 0 should read those input file(s) and broadcast it.)

Of course, feel free to submit re-factorings and additional checks if it helps in the current situation and with little work invested!

if _calorimeter.period and _calorimeter.file appear the same number of times

dunny, is it valid syntax? Sounds like one would need to call Plugin::load() for that which is also allowed to allocated a lot of stuff. so probably not easy.

if the full box size of the sim is valid with the supercell size that has been compiled in

That check can be moved most certainly! Also with gridDist check.

if all tbg variables that have been defined are actually used

Not easy. Not all tbg vars end up in the program options, some control restarts, some control job script "comments", etc.

ax3l avatar Sep 26 '18 09:09 ax3l

@n01r Yeah, some users of picador also did that, that's why we created a script that I described to at least automate that a little bit, so that a user does not need to manually check and submit the full simulation. I guess validation could be improved (hard for me to judge), but there could be no silver bullet, some things could always be properly checked only once the simulation is really started.

sbastrakov avatar Sep 26 '18 10:09 sbastrakov

@ax3l Right, any approach that is different from the current tbg-cfg workflow would also be nice.

At some point even nicely structured CFG files become pretty confusing if you have enough diagnostics.

n01r avatar Sep 26 '18 11:09 n01r

Runtime input files in python ftw!

sbastrakov avatar Sep 26 '18 12:09 sbastrakov

I have some idea how we can build on the plans I mentioned above and @codingS3b 's & @jkelling 's work to add support for PICMI (new repo) with PIConGPU.

(See an early LWFA example.)

Please talk to me if anyone is interested to invest work into this.

ax3l avatar Oct 04 '18 09:10 ax3l

Wow, that PICMI looks cool. Not sure I could invest work into this, but definitely interested.

sbastrakov avatar Oct 05 '18 08:10 sbastrakov

I think time-span wise it makes sense to look into this around end of the year, beginning next year with implementation work. By then we have the majority of @codingS3b 's & @jkelling 's work mainlined to built upon. We can also make this a student's project. Isolated into #2715 for documentation and planning.

ax3l avatar Oct 05 '18 08:10 ax3l

It seems this still does not work, right? (my branch cloned from dev in April 2019)

A small sim with the --validate option successfully ran yesterday, and it did not report any problems (at least I didn't see - the report should have been in the stdout or stderr, right?). But the calorimeter.file and calorimeter.filter was missing (the real sim crashed).

Or did I do anything wrong?

wmles avatar Sep 05 '19 13:09 wmles

@wmles the idea behind --validate is to only check your command-line (set in .cfg) parameters are syntactically correct. Unfortunately, there is no good way of checking logical correctness without starting a simulation. On a similar note, some other PIC codes support a mode when a simulation is actually started, just memory not allocated for the fields and particles and the latter not generated. Then things like missing files or variables in .param files being in an invalid range could be checked for. However, even this is not that easy to implement maybe and bug-prone on its own.

sbastrakov avatar Sep 06 '19 10:09 sbastrakov

@sbastrakov I think @wmles meant the run-time/command-line parameter of the calorimeter plugin (set via .cfg). As far as I know, a missing filter flag (if required) should throw an error syntactical error. Or am I mistaken?

PrometheusPi avatar Sep 06 '19 19:09 PrometheusPi

Afair, such sub-options likely have a default and it is just not properly documented in that plugin.

Update: hm, looks note likes this. Please open a plugin-specific issue if that is a bug: https://github.com/ComputationalRadiationPhysics/picongpu/blob/c520e8348ee29802aac6168a09e8e3d5fe97ae35/include/picongpu/plugins/particleCalorimeter/ParticleCalorimeter.hpp#L588-L589

ax3l avatar Sep 07 '19 07:09 ax3l

@ax3l I believe this code is not called at the argument parsing, but only afterward: from pluginLoad() when a simulation is starting. That validateOptions() method is only in the hierarchy for picongpu::plugins::multi::IHelp, not for ISimulationPlugin. Edit: the next sentence was wrong ~~(Side note: in all places, the method is accidentally hidden in the derived classes, but never used polymorphically as of now)~~. So we probably can put it to the hierarchies and override properly and call from the parsing in case that is what we want.

I think this is really the only proper way to check. Simple checks on the level of Boost.Program_options do not work well for plugins, since the parameters are somewhat conditionally required, only when a plugin is enabled. Thus they cannot be made required in terms of Boost.Program_options and now have to have default values.

sbastrakov avatar Sep 09 '19 09:09 sbastrakov

Just to clarify:

yes, I got an error like this

std::runtime_error( name + ": parameter filter and period are not used the same number of times" );

even twice:

  • first it crashed with a missing "file"-name
  • I added calorimeter.file and ran a small case with the --verify flag --> no warnings shown
  • and then crashed once more with a missing "filter"

And I may have overlooked it, but I believe, the first time there was only the "missing filename"-error and no indication that filter is required too.

In my opinion a default for filter and filename would do no harm. Users who want to use another filter will know it, and most newbie users don't need to bother...

wmles avatar Sep 09 '19 12:09 wmles

I'll bring this issue up again, since it pertains strongly to usability of the software:

We need a validation of plug in options - at the very least the defaults - when executing with --validate.

Specific example I observed: GIven that openPMD - the now de-facto library for outputs in PIConGPU - was compiled only with HDF5 and ADIOS1 support (ADIOS 2 compilation was behaving weirdly) and ADIOS1 not being supported it should not be possible to run PIConGPU with output/checkpointing at all.

In the specifica case, providing only the outputs I'd like to see (--openPMD.sources etc.) and the frequency with which they are to be dumped, as well as specifying the frequency of checkpointing passed --validate without complaints, but after running for ~40 min failed when writing checkpoints complaining that ADIOS1 is not supported as backend (checkpointing was set to start just before the 'init phase' of the simulation ends).

The way I see it this is not a problem of a user-supplied parameter which needs to be propagated to the proper plug-in, but a problem of a plug-in's own default settings.

Would it actually be possible to trigger a simple validation of the plug-ins default parameters without parsing the command line values when running with --validate?

Anton-Le avatar Aug 18 '21 13:08 Anton-Le

I think we can do proper parsing if there is a hook for plugins, similar to a hook to add command-line parameters. It can be either specifically for validation (and then it can be optional), or it can be specifically as the only place to grab and validate parameters from the command-line-args class. The latter would be more work, but more thorough checks.

sbastrakov avatar Aug 18 '21 14:08 sbastrakov