pywr icon indicating copy to clipboard operation
pywr copied to clipboard

Issue with the solver GLPK

Open BaptisteFrancois opened this issue 6 years ago • 17 comments

Hi, It looks like I am having significant issue with GLPK. Long story short - I have a working model that works sufficiently well. I am now trying to run several climate scenarios to assess the climate risk for the considered system.

When running these scenarios (mainly based on a resampling of T and P variables + step changes on these variables), the solver crash in the middle of the simulation, stating that NO PRIMAL SOLUTION EXISTS.

Sometimes, I got the error message below:

Assertion failed: teta >= 0.0 Error detected in file ..\src\simplex\spxchuzr.c at line 294

Looking online about the above assertion, it looks like this might come from numerical instability. Have you encounter such a issue in the past? Thanks

BaptisteFrancois avatar May 05 '19 22:05 BaptisteFrancois

I have only encountered this when a NaN has made it way through as a row bounds or objective coefficient. You could debug by rebuilding by passing --enable-debug to setup.py - make sure it does a complete rebuild though. This should make sure there are some checks on the finite value of all values going to GLPK.

jetuk avatar May 06 '19 18:05 jetuk

@BaptisteFrancois did you resolve this issue?

jetuk avatar Jun 05 '19 10:06 jetuk

@jetuk Sorry - I have been kept busy with traveling and other duties... I will focus on this the next weeks. I'll keep you in informed about how things are going. Thanks.

BaptisteFrancois avatar Jun 06 '19 18:06 BaptisteFrancois

Fixing #759 (PR in #762) would help with not crashing Pywr in this case.

jetuk avatar Sep 18 '19 12:09 jetuk

@BaptisteFrancois Did you ever figure out what the issue was? I'm having the same problem now too.

UPDATE: So in my case it was as @jetuk mentioned above: I had a stray missing value in a CSV file, which resulted in a NaN. This should probably be included in any future error catching. Fortunuately (for us!) googling this error lands us here pretty quickly.

drheinheimer avatar Jan 17 '20 22:01 drheinheimer

@jetuk @rheinheimer sorry for taking like a year to get back to you... I am still experiencing this issue. This issue does not result from a missing value in an input file. If I rerun the exact simulation it may go through without issue.

@jetuk I am willing to try your suggestion: ebuilding by gpassin --enable-debug to setup.py. However, I am not quite sure how to do this.

I am basically running Pywr through a Python script using firts m.load(my_model) and then m.run(). Could you help me regarding the setup.py code I should pass the enable-debug argument? Thanks.

BaptisteFrancois avatar Aug 19 '20 12:08 BaptisteFrancois

You'll need to clone the repository and run a command like this:

python setup.py develop --with-glpk --enable-debug

If you are using Anaconda on Windows you might need to do something like this:

set LIBRARY=%CONDA_PREFIX%\Library
set LIBRARY_INC=%LIBRARY%\include
set LIBRARY_LIB=%LIBRARY%\lib
python setup.py build_ext -I"%LIBRARY_INC%" -L"%LIBRARY_LIB%" --inplace --with-glpk  --enable-debug develop

If you do this you should get some assertion errors if there are non-finite values being given to the GLPK update routes. You may get some extra output if there are very small but not zero values being used.

jetuk avatar Aug 19 '20 18:08 jetuk

@BaptisteFrancois I get this error on a regular basis (including just now, prompting me to reply), though it's completely unpredictable. I just rerun the model and cross my fingers. Usually it works without problem. Because it's trivial just to re-run the model, I just do so, but might eventually try to debug.

drheinheimer avatar Aug 24 '20 20:08 drheinheimer

@rheinheimer can you reproduce this at all?

Are both of you using a algorithm/system (e.g. MOEA) that is giving random(ish) inputs to a model and then re-running it?

jetuk avatar Aug 31 '20 21:08 jetuk

@jetuk @rheinheimer Apologize ... I have not found time yet to seriously investigate this. On my side, I am not using MOEA. It is also difficult to reproduce because, as described above, the model crash is almost random. Contrary than for @rheinheimer, these random bugs are annoying for me because I am running several runs in parallel through MPI. When one simulation crashes, it cascades to the MPI process.

I think I have diagnosed another issue leading to random a random crash. Specifically, I noticed than : self.model.nodes['reservoir'].get_level(scenario_index) sometimes returns an infinite value, which makes the model crashes. The crash often happens at the first time step. Note that the max values used within the 'level' and 'storage' attributes, required for the interpolation via the 'get_level' method, are significantly larger than the storage max.

I have not given up installing the developer version of pywr for using the debug mode. I have to try this but just have not found time yet for doing so.

BaptisteFrancois avatar Aug 31 '20 23:08 BaptisteFrancois

What parameter are you using for the level calculation? Does the issue go away if you use a ConstantParameter instead?

jetuk avatar Sep 01 '20 08:09 jetuk

I am not using MOEA or anything random. I cannot purposefully reproduce, other than run the model a few times in a row, and even then, maybe the issue will occur one or more times, maybe not.

drheinheimer avatar Sep 03 '20 20:09 drheinheimer

My working theory is that the general issue is to do with floating point precision comparison problems when working with a fixed (or what should be fixed) constraint. I think we might be making a doubly bounded constraint with a very tiny range. I'll see about making a PR that we can use to test that theory.

I think the first time-step level issue might be unrelated if there are NaN's involved though.

jetuk avatar Sep 03 '20 20:09 jetuk

@BaptisteFrancois I have created a branch (glpk-fixed-con-threshold) with a potential fix as described above. See https://github.com/pywr/pywr/pull/925. Is there any chance you could try this out and see if it helps?

jetuk avatar Sep 10 '20 16:09 jetuk

@jetuk sure I can do that. I noticed that you included the threshold with the pywr/solvers/cython_glpk.pyx . However, I do not have this file in my pywr folder. I only have cython_glpk.cp36-win_amd64.pyd.

Does that mean I am using a previous version of pywr? B.

BaptisteFrancois avatar Sep 10 '20 16:09 BaptisteFrancois

No, it means it's part of the source code that creates that pyd file. To test this change you'll have to compile Pywr from source unfortunately.

jetuk avatar Sep 10 '20 16:09 jetuk

ok got it. I'll try to get this running by tomorrow evening, if not you should hear from me sometimes next week.

BaptisteFrancois avatar Sep 10 '20 17:09 BaptisteFrancois