Numerical properties of the LP optimization problem generated from message_ix
We would like to improve the numerical properties of the LPPs the message_ix sends to LP solvers. IMHO this requires a concerted treatment of at least the following issues:
- Model specification: split dense rows and columns; especially important for the IPM (barrier) lpmethod, which is usually the best for large LPs (provided they have sparse Cholesky). Note that just one very dense row or column may dramatically increase the matrix density and thus the number of multiplications in each iteration. Although some solvers split dense rows/cols but the modelers certainly can do it much better.
- Data cleansing (cleaning) in our DB should support management of "correct" ranges of data values and the corresponding units. The data import from the DB and its conversion to the LP parameters should include replacing "non-substantial" ("small") coefficients by zeros and warnings about the "large" coefficients. Large values of bounds/rhs should be verified, and - if appropriate - replaced by the infinity (i.e., "no bound"). A utility for the basic diagnostics of the LP matrix would be very helpful.
- Diagnostics of the LPP numerical properties. This is a pretty complex issue, so let's start with a simple element, i.e., the condition number (usually denoted by Kappa) that might be provided by GAMS and/or solvers. I have not found any GAMS function/option for computing Kappa (maybe it still provides it but my very limited knowledge of GAMS was not enough to find it). However, I've found the cplex binary option called quality, which outputs info (including Kappa) on the solution quality. Even if it works then it can only help in evaluation of the solution sensitivity (I guess it provides the Kappa of the basis). Kappa of the Cholesky would much better help in evaluation of the matrix numerical properties. If GAMS indeed does not provide such a function then may the R does (I guess chances are good and hopefully colleagues experienced in R can quickly check this).
Summing-up: these issues call for a careful consideration. The open question however is, if we will find enough motivation to deal with these topics.
thank you @MarekMakowski for the detailed description!
it seems that quality can easily be called via the GAMS CPLEX options
thanks Daniel for the comment. Well… your comment shows that I’ve succeeded to hide this piece:
...the cplex binary option called quality, which outputs info (including Kappa) on the solution quality. well beyond (maybe too much?) details. Sorry 😢
I’ll experiment with the cplex quality option although it appears to provide limited info, i.e., on the sensitivity of the “optimal” solution. Therefore, I think, we should still try to get the Kappa of the QR of the AA^T (maybe someone remembers the algebra better than I do and thus can suggest a better method for evaluation of an LPP numerical properties). The best I was able to find last night is the hint that R provides the kappa(…) function, which appears to do this. The IMSL library has, of course, several functions for the condition number calculations but … although IIASA maintained the IMSL in 1980s, I am afraid it would be time consuming to try to make IMSL operational (and I am not sure that this old Ftn library would handle the Indus-model-size matrix). This is why I suggest to explore the expertise of colleagues who know the R.
However, the kappa will most probably only confirm what we already guess. Therefore, I still think that the most effective for us would be to try our best to be closed to the classical advice: keep the abs-values of LPP coefficients within the [0.01, 100] range. I know… nowadays this is commonly considered “impossible” but we all know there is a very good reason beyond this advice. We also know that generating models with coefficients within a “reasonable” range takes a lot of time which is our scarce resource. However, I am convinced that this is one of the best “investments” towards the quality of our models.
Bests, Marek
On Mar 21, 19, at 7:17, Daniel Huppmann [email protected] wrote:
thank you @MarekMakowski for the detailed description!
it seems that quality can easily be called via the GAMS CPLEX options
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
As promised, I've explored the Kappa (cplex option: quality 1)

Adding the aggressive scaling cplex option (scaind 1) improves the Kappa but it is still far too high to trust the results; moreover, the infeasibilities remain:

@khaeru (#127 copied(?) here): I am not sure if the barrier (or any other lpmethod) is the best default. Maybe an annotated cplex.opt file with all defaults and the suggested [for trying] options commented will be a better (although not easier to use) solution?
We discussed this issue in today's message meeting and I'd like to summarize a few points here:
- The lp_diag tool was developed to address this problem. As the name suggests, this is a diagnostic tool, so it's not fixing the numerical properties of any run, but its output can be used to find coefficient causing numerical instabilities/unreliabilities that can then be fixed manually.
- @ywpratama has continued work on the
lp_diagtool and will open a PR to update it soon. This PR should eventually contain tests for the tool, for which we could use thewesterosScenario adapted to contain badly balanced coefficients. @MarekMakowski also offered to expand the tool's functionality if there is interest from the community, which should likely be discussed in another focus topic session of the weekly meetings. - @MarekMakowski also mentioned two related things: first, if the
Kappavalue mentioned above is equal to or greater than 10^7, the whole numerical solution is unreliable. This made me wonder: how much work would it be to obtainKappafor every Scenario run and raise a warning if its value becomes too large? Such a feature might increase the reliability of our results as it would prevent users from trusting results that are unreliable. - Second, some numerical issues are caused by matrix rows that are too dense. @MarekMakowski suggested this could be remedied by splitting the corresponding variables/equations into different parts (which produce less dense rows) and then adding them back together to retrieve the final value.
@glatterf42 : thanks for reviving the issue.
Just a quick comment on the kappa (matrix condition number):
- kappa can easily be obtained (for the optimal basis) by specifying the corresponding option in the cplex config
- a rule of thumb: if kappa=10^k then we lose up to k digits of the solution accuracy on top of what is lost during the optimization due to (limited) precision of arithmetics. For a solution copied below k=11 for the objective value having 9 digits☹️
- of course, the ideal k=0 is hardly attainable. I don't know any "official kappa rule". My (conservative) intuition: I don't trust solutions having k > 6; I practically always succeeded to modify the model instances whenever I had optimization results with k>6. However, I must admit that it takes time, sometimes a little, sometimes substantial amount. Therefore, I tried to avoid the problem by spending time on an appropriate model (symbolic) specification, and then analyzing/filtering the resulting matrix coefficients. The latter of course usually helps in getting sparse Cholesky, which in turn speeds-up optimization.
If the topic is interesting enough for the Team to have another discussion then I can prepare a short summary of the related issues for one of our weekly mtgs.