Stricter definition of numerics
There has been a background discussion regarding floating point semantics, especially for the weird cases and I thought it would be best to get some agreement.
For the weird cases I mean:
- [ ] Denormalized numbers and underflow
- [ ] Not a number, and failure to evaluate
- [ ] Infinity and overflow
My thinking is that we use IEEE-754 (alternatively any similar floating point numbers system) with the following extras:
- Denormalized numbers can be rounded to zero, or gradual underflow can be used - either is fine:
- Most processors have this capability. If a model depends on gradual underflow and specific denormalized numbers I would say that it is ill-formed.
- Since we don't allow equality comparison between reals expressions (and allow event-epsilon), we should in particular not do equality comparison with denormalized numbers (to tell them from zero) so I don't see that models will be impacted by this (except in weird cases)
- The reason for rounding them to zero is that denormalized number are costly in terms of performance.
NaN(Not a Number) can propagate in the normal way (so thatsqrt(-1)just generatesNaNand e.g.,56.6*NaNisNaN), provided that an accepted model evaluation does not contain:- Any variable having the value
NaN - Any relational operator using
NaNas an operand - Any conversion to integer using
NaNas an operand - (Maybe additional similar restrictions). The logic is that since
NaNspreads to everything we just have to check the result, not all individual expressions. I don't know if this will gain anything, but I don't see a problem with allowing it.
- Any variable having the value
- Failure to evaluate (such as 0/0). One possibility is to generate quiet
NaNaccording to previous, another is to fail; both should generate the same result (at end). Unless the intermediateNaNdoes not impact the result. That is a modeling issue that tools are not required to detect (at the end) - Infinity. One possibility is to just do as for
NaNfor +/-infinity, but another alternative is given below - Overflow. If the correctly rounded result cannot be represented in the range from smallest to largest finite floating point number we get overflow; that is treated as infinity above.
To me it would be good to have this now, and the possible extension for infinity is to allow:
At least for evaluable parameters allow:
- Parameters bound to +/-infinity, and relational operators with one or both sides being +/-infinity (excluding both sides being the same infinity), with the usual semantics
- Parameter expressions evaluating to +/-infinity
- Not part of systems of equations
- Whether
1/0should be allowed is a point worth discussing - But, we likely want to keep the simplification
p*0*xsimplifying to0(see below), instead of gettingNaNand failure.
And possibly even for other parameters, and even discrete variables.
Background - the critical points for not "just using IEEE-754" are:
- Solving system of equations with non-finite numbers (as residuals and/or solutions) is not a good idea (and not well-defined).
- We want to allow mathematical simplifications that are valid for finite numbers, but that are not legal if we allow non-finite numbers (such as
a=p*0;givinga=0anda+p=b+p;givinga=b;) and similarly we don't want to allow non-finite solutions. - The idea with IEEE-754 is to be precise in terms of computations, but the advanced simplifications makes those guarantees less meaningful.
Specifically it could be that the simplification p*0*x is sufficiently rare for evaluable parameters that we can delay it until evaluating the evaluable parameters, and in case it is actually infinity view this expression as an error (should be rare, right?).
However, even if this might make perfect sense and have very little impact for models in practice, it could still be complicated to implement in tools. That's one reason I think we should do this later.
Note that for NaN a special case would be something like:
algorithm
x:=sqrt(y);
if y<0 then
x:=0;
end if;
To me this model has an error when y<0, but a tool does not necessarily have to detect it (if using NaN for negative sqrt it will be overwritten); and even with our current rules, https://specification.modelica.org/master/operators-and-expressions.html#evaluation-order , a tool could simplify this to:
algorithm
if y<0 then
x:=0;
else
x:=sqrt(y);
end if;
(avoiding the issue).
(Reporting and adapting my argument from #4479, since it is relevant for this discussion.)
Note that expressions in the Modelica code are not always interpreted literally.
A Modelica tool is free to apply transformations to expressions and equations that are equivalent from a mathematical point of view, but not from a numerical point of view. For example, it can
- evaluate constant terms at compile time
- rearrange terms e.g. a + b + c -> (a + c) + b, because some of them are constant or for other good reasons
- rescale variables and equations based on nominal attributes, which is in general a good idea if SI units are used, see DOI:10.1145/3158191.3158192, in order to obtain better scaled equations for numerical solvers
- replace expressions with other expressions that are mathematically equivalent but, hopefully, better numerically behaved
- solve/simplify equations and inequalities by symbolic manipulation
- differentiate equations for index reduction
- apply AD to the algorithmic code of functions to automatically generated derivatives required by the solvers or for index reduction
- etc. etc.
Adding NaN, Inf and the associated semantics to Modelica means that every time a tool applies any such transformation, it will need to take care of that as well.
As Aesop wrote in his fables, be careful what you wish for, lest it comes true! 😃
Numerical routines should be written to avoid overflow.
If you look at something simple like the 2-norm in the BLAS-routine dnrm2 you can see that this is something that people actually do.
For scaling it is both important and normally a non-issue. The reason is that scaling is intended to get values closer to 1 and if scaling introduces an overflow it indicates that not only is the scaling not doing what it is supposed to do, but actually making the problem worse.
If you look at the scaling defined in the Modelica specification it uses |x.nominal|+|x| as scale factor for x. You don't have to divide by it, but if you do - dividing x by this scale factor is by definition guaranteed to produce a value <=1 in absolute terms, and thus not overflow (it may underflow if x was close to the smallest number and the nominal value larger than 1, but underflow is less of an issue - especially as this case indicates that the modeler don't care about such a small value).
The only issue is if |x.nominal| is close to the maximum floating point number (DBL_MAX), and the variable also has a value of similar magnitude. Adding a restriction that |nominal| should be less than something like DBL_MAX*DBL_EPSILON avoids that case, and I don't see any existing models violating that. Alternatively we could recommend max(|x.nominal|, |x|) instead, - which avoids the issue without any restriction, or some other alternative.
The reason is that scaling is intended to get values closer to 1 and if scaling introduces an overflow it indicates that not only is the scaling not doing what it is supposed to do, but actually making the problem worse.
The problem is when you are handling some quantities together with infinity, e.g. for comparisons. If the quantity at hand is very small (e.g. nominal = 1e-12, as it can happen with time scales in microelectronics), scaling requires to multiply by a 1e12 factor. This gets the quantity to be better scaled, but if infinities are also involved, they also get multiplied by 1e12, which can cause them to overflow if you don't have healthy built-in margins.
The only issue is if |x.nominal| is close to the maximum floating point number (DBL_MAX)
This is never going to happen in practice. Ever.
The reason is that scaling is intended to get values closer to 1 and if scaling introduces an overflow it indicates that not only is the scaling not doing what it is supposed to do, but actually making the problem worse.
The problem is when you are handling some quantities together with infinity, e.g. for comparisons. If the quantity at hand is very small (e.g. nominal = 1e-12, as it can happen with time scales in microelectronics), scaling requires to multiply by a 1e12 factor. This gets the quantity to be better scaled, but if infinities are also involved, they also get multiplied by 1e12, which can cause them to overflow if you don't have healthy built-in margins.
Except that the scaling isn't purely with nominal but a mixed absolute-relative, i.e.., abs(x)+nominal(x). If the value is 1e300 and the nominal is 1e-12 then the scaling should therefore be about 1e300 - not 1e-12; and thus the scaling does not cause overflow.