ModelicaStandardLibrary
ModelicaStandardLibrary copied to clipboard
How to specify models that are meant to fail in a specific way in ModelicaTest?
The ModelicaTest library contains a large number of test cases that are meant to test the behaviour of components of the MSL. They can of course also help to assess the quality of implementation of Modelica tools, based on their ability to run them successfully.
The convention, explicitly stated in the documentation, is that all models that are meant to be run have an experiment(StopTime)
annotation. The implicit understanding is that all such models should run successfully. In fact, it could make a lot of sense to also include models that are expected to fail in a specific way, e.g. because of assertion violations, to test that such failures are indeed happening as expected and reported to the end user in a meaningful way.
This of course could also be useful for other libraries beyond ModelicaTest, but I understand ModelicaTest should set the standard bye example.
Now, the ModelicaCompliance library already provides a similar feature by means of __ModelicaAssociation(TestCase)
annotations. From the examples I see there, I understand this annotation has two fields, a Boolean shouldPass
and a String section
. The first declares whether the model is expected to pass or fail, and the second declares the relevant section of the specification in case of failure.
The first question is: where are these annotations defined? I couldn't find that anywhere. Or is the definition implicit in how we use them for ModelicaCompliance?
The second question, which is much broader, is: how could we extend it for simulation tests, in a way that is more informative than just pass/fail, but still tool-independent?
One simple proposal for simulation models could be the following:
For models with an experiment(StopTime)
annotation, the default behaviour is that the model simulates successfully until the required StopTime, producing some simulation results. In case a model is instead expected to fail, the expected failure mode can be declared with
annotation(__ModelicaAssociation(TestCase(
shouldInitialize = <Boolean, default = true>,
shouldSimulate = <Boolean, default = true>,
failureDescription = <String, default = "">)));
shouldInitialize = true
means that an initial solution should be found. Otherwise, shouldInitialize = false
means that the model should not reach the point where the initial solution is computed, for some reason, e.g. assertion violation. It is not really possible to distinguish between compile-time and run-time failures, because that depends on which parameter evaluation policy a specific tool chooses.
shouldSimulate = true
means that the model should simulate successfully until StopTime
, or until terminate()
is called. Otherwise, shouldSimulate = false
means that the simulation won't be able to reach StopTime
. If shouldInitialize = true
, this means there must be a failure at some point during the simulation, e.g. because of an assert violation, otherwise the failure is expected to be during initialization or compilation.
In case of specified failure of any kind, failureDescription
should contain a textual description of the reason of failure. I'd say that trying to specify this further in a tool-independent way is hopeless, but I guess tool vendors could manually check once that the output of their tool is consistent with that description, and then just do standard regression testing on their tool's output.
What do you think?
Adding @HSchatzTLK to the discussion, since he made the initial suggestion to make this possible.
See also #1292. Looks like an extension to me.
Is this the annotation you are looking for? https://specification.modelica.org/master/annotations.html#modelica:TestCase
Another good reason to switch to Modelica 3.5. :)
For models with an
experiment(StopTime)
annotation, the default behaviour is that the model simulates successfully until the required StopTime, producing some simulation results.
Note that simulating until the experiment.StopTime
is not the only way to successfully complete simulation. At least terminate
provides one alternative to successfully end the simulation.
See also #1292. Looks like an extension to me.
Indeed, I forgot I had opened that back then. I would close that one and keep this, since there was not much discussion there.
Is this the annotation you are looking for? https://specification.modelica.org/master/annotations.html#modelica:TestCase
Yes and no. The ModelicaCompliance library still uses __ModelicaAssociation
annotations, which also feature the section
field.
Besides, I believe that only specifying shouldPass
is really not enough information. For the compliance checks, the reference to the specification section is obvious, for simulation models I think some kind of description of the type of failure would be highly useful, otherwise the model could fail for the wrong reason, and one would never know.
Another good reason to switch to Modelica 3.5. :)
I guess we'll need 3.6. :)
Of course I'm fully in favour of standardizing this extension, instead of using __ModelicaAssociation
vendor annotations, since its scope clearly goes beyond the MA.
Note that simulating until the
experiment.StopTime
is not the only way to successfully complete simulation. At leastterminate
provides one alternative to successfully end the simulation.
You are right, I edited my proposal accordingly.
Is this the annotation you are looking for? https://specification.modelica.org/master/annotations.html#modelica:TestCase
Yes and no. The ModelicaCompliance library still uses
__ModelicaAssociation
annotations, which also feature thesection
field.
Yes, we should update that at some point.
Besides, I believe that only specifying
shouldPass
is really not enough information. For the compliance checks, the reference to the specification section is obvious, for simulation models I think some kind of description of the type of failure would be highly useful, otherwise the model could fail for the wrong reason, and one would never know.
Specifying the type of failure isn't clear for several reasons: whether it fails during simulation or already during translation may depend on tools - and standardizing the different "types of failure" seems complicated - as we haven't standardized them, and the same underlying issue may be seen in different ways.
The reference to the specification is more seen as a human-readable explanation - not something that needs a strict definition.
A better way is to ensure that there are more tests; both of correct and incorrect models.
Is this the annotation you are looking for? https://specification.modelica.org/master/annotations.html#modelica:TestCase
Another good reason to switch to Modelica 3.5. :)
Note that this is not only handle these cases for test-models, but there's another important perspective.
The shouldPass=false
is good for models intended to show users what errors they should avoid, the examples in Modelica_StateGraph2.Examples.WrongStateGraphs are more of that kind.
In some libraries these are just represented as images to avoid having incorrect models, which is clearly less useful.
The different perspective for these incorrect models are that they aren't intended to test tools, but to help users understand how to model.
The reference to the specification is more seen as a human-readable explanation - not something that needs a strict definition.
In my opinion, so should be the reason of failure. It is meant to be written by whomever wrote the test to explain the goal and rationale of the test, and to be read by whomever will need to judge the outcome of the test, besides automated reporting and regression testing.
Of course this information could also be added in the info layer, but I guess a short summary in that annotation would help, if added to the automated reports.
Of course this information could also be added in the info layer, but I guess a short summary in that annotation would help, if added to the automated reports.
Or, it's better to not have two alternative places for the same sort of information. For a model with shouldPass = false
, why not make it standard procedure to explain why in the Documentatio.info
?
Of course this information could also be added in the info layer, but I guess a short summary in that annotation would help, if added to the automated reports.
Or, it's better to not have two alternative places for the same sort of information. For a model with
shouldPass = false
, why not make it standard procedure to explain why in theDocumentatio.info
?
I'm in favor of using info layer, for now.
If "the test tool" allows easy access to info layer, this should be good enough.
Are there Modelica tools allowing to link to specific HTML sections/headings within the info layer? (By this, we could have a convention for a heading, like "Why this model should not simulate".)
Sidenote, risking off-topic: More annotations could be specified, but then I would broaden the scope: What could model developers want to tell people using their model, if it fails? "Oh, that one is tricky to initialize. Try changing ..." or "Oh, that usually fails in this nonlinear system of equations. Try changing this ...." or "This model is known to work in Modelica tool A, but not in Modelica tool B" or ... MSL models should always work. Real world models usually don't, if you start changing model structure or boundary condtions. Should there be a mechanism for end users? Or is it for library development, only?
Are there Modelica tools allowing to link to specific HTML sections/headings within the info layer?
System Modeler is one such tool.
I'm in favor of using info layer, for now.
If "the test tool" allows easy access to info layer, this should be good enough.
Are there Modelica tools allowing to link to specific HTML sections/headings within the info layer? (By this, we could have a convention for a heading, like "Why this model should not simulate".)
I would say we should start the documentation with why it shouldn't simulate as that's the main reason for having the model in the first place in this scenario. So I don't see a need to have a special section for it and be able to link to it.
If it were work-in-progress and it is intended to simulate, but doesn't (yet) I would still put it first so that you don't miss it. And also so that once you fix it you are sure to remove that.
Linking to special sections has been somewhat possible in tools; and with https://github.com/modelica/ModelicaSpecification/pull/2531 it was more standardized.
Sidenote, risking off-topic: More annotations could be specified, but then I would broaden the scope: What could model developers want to tell people using their model, if it fails? "Oh, that one is tricky to initialize. Try changing ..." or "Oh, that usually fails in this nonlinear system of equations. Try changing this ...." or "This model is known to work in Modelica tool A, but not in Modelica tool B"
@GallLeo, the idea here is not to put models in the MSL that may fail depending on the tool, because this goes against the idea of being "Standard". The idea is to put models that should fail for perfectly legitimate and documented reasons.
MSL models should always work.
I do not agree.
Our libraries and the literature are full of models that always work, but this is not an accurate representation of the real world. In the real world models fail all the time, and in many cases they do so because the exceed their validity range. In this case, Modelica provides mechanisms (e.g. assertions) that should allow to identify what (legitimately) went wrong, without the need of shamans to interpret arcane solver error messages. Libraries that provide such mechanism are of higher quality than libraries that don't, and the MSL should be one of them. And, tools are expected to take full advantage of these mechanisms, so we should test whether they do or not.
Also, in some cases components may be abused, e.g. by connecting them in a wrong way. If tools can catch these situations and report it, that should be tested too.
A better/more standardised testing story for Modelica would indeed be useful!
Besides, I believe that only specifying shouldPass is really not enough information. For the compliance checks, the reference to the specification section is obvious, for simulation models I think some kind of description of the type of failure would be highly useful, otherwise the model could fail for the wrong reason, and one would never know.
Specifying the type of failure isn't clear for several reasons: whether it fails during simulation or already during translation may depend on tools - and standardizing the different "types of failure" seems complicated - as we haven't standardized them, and the same underlying issue may be seen in different ways.
To enable the modeller to define "sharper" failure definitions, it might make sense to look to successful examples from other domains for ideas.
The very popular Python testing framework pytest
has pytest.raises
, which allows one to specify exactly which exception class a certain code section shall trigger (if not, the test fails).
As Modelica lacks (afaik) an exception hierarchy to lean on here, one could instead specify a regular expression of the expected error message, just like the match
argument in the above link. The content of e.g. assertion texts will be not as tool-dependent, so it should be feasible to catch well-separated error cases this way. This corresponds loosely (sans the regex matching) to the REQUIRE_THROWS_WITH
macro of the C++ testing framework doctest
.
Considering that assertions have an error or a warning level, it might make sense to foresee a mechanism to check for either (or both).
So, maybe TestCase
could work similar to this:
-
TestCase(errors=true)
-- any encountered simulation-terminating error means the test is successful. If the simulation completes, the test failed. -
TestCase(errors="*tank.T below critical*")
-- expects that this fragment matches at least one error message.- Regex could be used to deal with decimal jitter in case numbers are required (
314.3\d+
) - if having one argument be either a bool or string is not desired or possible, one can use
errors
anderrors_with
.
- Regex could be used to deal with decimal jitter in case numbers are required (
-
TestCase(warns="*pressure outside region of good model accuracy*")
-- expects that a matching warning has been emitted
P.S.:
- In my experience, the
failureDescription
would be a test name in other frameworks, as in it being the primary information reported to the user if the test fails, i.e. does something unexpected. - In the spirit of precise standardized language, shouldn't
shouldPass
rather be namedshallPass
? Or is it only recommended to pass (or not)?
By the way, XogenyTest (pity that it has not taken off) is also using a TestCase
annotation (e.g. here), although it uses a different format, with arguments action
(I've seen values simulate
, and call
in functions) and result
(`"success"/"failure"), I don't know if that came maybe from an earlier iteration of the annotation in today's standard.
Specifying the type of failure isn't clear for several reasons: whether it fails during simulation or already during translation may depend on tools - and standardizing the different "types of failure" seems complicated - as we haven't standardized them, and the same underlying issue may be seen in different ways.
To enable the modeller to define "sharper" failure definitions, it might make sense to look to successful examples from other domains for ideas.
I'm afraid this shows that @HansOlsson's first comment above wasn't understood correctly. My understanding of @HansOlsson's comment — which I fully agree with — is that enabling "sharper" failure definitions is unlikely to serve us well. To be useful as a tool-independent incorrect test model, the first thing to standardize is how the intention of the model's creator is communicated to the user or the tester of a particular tool:
- Independently of tool used, the user should be able to understand the tool's behavior based on that intention.
- A tool vendor's test department should be able to set up a tool-specific test for the model based on that intention.
enabling "sharper" failure definitions is unlikely to serve us well.
I'm confused -- are you arguing that having a sharper ("more specific") failure definition like TestCase(errors_with="*Temperature T is not in the allowed range*")
is less useful than TestCase(shouldPass=false)
?
Where do you see the former ending up not serving us well? The message comes straight from the assert that the associated test will be designed to hit, and therefore is afaict tool-independent.
As for the second part of your comment, can you give an example for that "intention of the model's creator" you are mentioning? As far as I can tell, this is today encoded mainly in (hopefully) expressive and useful assert texts that tell the recipient of the error what's going wrong, and this is what we could match on today, and is (I think) tool-independent, so I'm not sure where you see this approach lacking.
If the aim is to extend the standard with a more fine-grained error classification (for expressing modeler intent), I think I agree, and I suspect a hierarchical setup similar to Python's exceptions (modified to Modelica's specifics) could be useful. There was a paper about Modelica exceptions once, so I'm definitely not the person that has spent most thought on the topic.
While I also see that there are situations where an assert is the only way that a simulation should fail, and that this is a kind of incorrect model behavior that seems possible to describe in a standardized way, I think this situation is too specific to motivate why we should leave the simplicity of shouldPass = false
along with a good human readable description of why. There are so many other ways an incorrect model could fail, and I think we shouldn't begin to standardize the expectation of just one or two of them before we have a plan for most or all of them (which to me seems out of reach at the moment).
I see, that makes sense.
Maybe this was unclear, but I did not propose to leave the shouldPass
mechanism - that is the first bullet point above (just with an adjusted name). Extending that with a more specific mechanism is something I would have considered useful (e.g. in combination with a range of test-oriented asserts like from XogenyTest), hence my comments.
There are so many other ways an incorrect model could fail
I guess a first step could be to enumerate those failure modes, and how to detect them? I recognize though that that discussion is out of scope here.
I guess I'm spoiled by Python were exceptions are embedded in everything, and it's quite easy when writing tests to intercept thrown exceptions or emitted logging to precisely specify test success/failure. 😇
I guess a first step could be to enumerate those failure modes, and how to detect them? I recognize though that that discussion is out of scope here.
Yes, but I think it's important to realize that the current situation where the combination of asserts and shouldPass = false
is probably driving us in a direction where we try to cast all problems in a form where an assert is triggered, and the situations where this isn't possible never make it into a negative example in a tool-independent library. With a standardized way of giving a human-readable description of what is wrong with a model, what you are suggesting is something that could be prepared as an open MA library with a value on its own – a ModelicaFailures library that would complement ModelicaCompliance.
what you are suggesting is something that could be prepared as an open MA library with a value on its own – a ModelicaFailures library that would complement ModelicaCompliance.
FYI, I already started doing something like that some years ago, see https://github.com/casella/FailureModes
However, those are just models that fail because of (known) numerical reasons, and the goal in that case is to see if the tool can give meaningful feedback to the end user when faced by such failure modes.
However, those are just models that fail because of (known) numerical reasons, and the goal in that case is to see if the tool can give meaningful feedback to the end user when faced by such failure modes.
Yes, that was a great initiative, and how nice to see that you turned to Documentation.info
for the human-readable description of why the simulation should fail!