benchexec
benchexec copied to clipboard
Tagging Properties and Filtering
Based on discussion:
I propose to allow to attach arbitrary tags to properties in task-definitions:
properties:
- property_file: ../properties/unreach-call.prp
expected_verdict: true
tags:
- fast
- ci
- property_file: ../properties/valid-memsafety.prp
expected_verdict: false
subproperty: valid-memtrack
tags:
- false_properties
(tags can be replaced with a similar name, e.g., labels, but I think tag is appropriate)
The type is Optional[set[str]]
The tags can be "selected" by --require-tags [list of strings] (require all given tags to be present) --exclude-tags [list of strings] (require no given tag to be present) or in similar format inside <tasks>.
Related: Tags could also be defined on the tool and model level, the tags that are then considered for filtering in the end are then the union of all given tags.
A more powerful version would be an expression structure like
<filter> <!-- by default an and constraint -->
<or><tag>a</tag><tag>b</tag></or>
<not><tag>x</tag></not>
</filter>
so just defining a propositional formula with atomic propositions that can be derived from model and properties. This proposal "only" adds tag filtering but it shouldn't be hard to add more (FWIW a tool info could provide such information...)
The second part definitely is more of a "far fetched" idea and rather advanced. But would allow for a lot of control for advanced users.
The examples here may not be the best examples to show the importance of this feature. false_properties duplicates the information from the expected verdict (error prone!) and is not needed because we explicitly support filters for this since #529. fast is likely a statement that one cannot really make about a task because it always also depends on the tool. ci might be useful in situations where you have a test suite checked in together with a tool in the tool's repo. But this tag sounds mostly orthogonal to the rest of potential tags, and thus could typically be easily replaced with a benchmark definition (or .set files) that list the tasks that CI should execute.
Tags for tasks sound more important for general tool-independent benchmark sets (like SV-Benchmarks), but the presented examples don't really match that. What would such use cases be?
And how complex would it be to implement these use cases without tags?
Note that we already have quite some flexibility with task selection: With -t you can choose the <tasks> tags (with wildcards). Inside <tasks> you can include and exclude tasks (both with wildcards, exclude overrides include). And if you want to write down sets of tasks independently of the benchmark definition, you can do so with our .set files, which can contain lists of wildcards and can be used for inclusion and exclusion of tasks in the benchmark definition (and again you can refer to them via wildcards).
So how about first trying out a concrete use case for example with .set files, and then we can see how clumsy that is (or whether it is) and how much simplification tags would bring?
If we go forward, one question I would have about the design of this is "Why attach tags to properties? Why not to the tasks?".
Related: Tags could also be defined on the tool and model level, the tags that are then considered for filtering in the end are then the union of all given tags.
This is something I don't understand. What are the tool and model levels?
So in general, I guess something like this would be possible, but it would have relatively low priority for me unless there is an actual use case where the existing mechanisms don't work well.
The examples here may not be the best examples to show the importance of this feature.
Yes, I didn't want to go domain specific.
A more useful example would be to add the type of the model and property (e.g. MDP, DTMC, SG, CSG, ... / reachability, mean-payoff, ...), since certain tools only support a certain combination thereof.
The appeal of tags or similar would simply be conciseness.
Suppose I have three tools. I want to evaluate all of these tools on MDP + reachability, but maybe tool 1 on every other MDP benchmark, tool 2 on every reachability property, ... So to define all their tasks I would need to manage several files at least.
Now, suppose I want to add a new property / model of that type. Then, I need to remember to also update at least one .set file instead of attaching this information directly where it "belongs". In a sense, tags are not different from filenames: I can add every tag to the filename and write (complicated) wildcards to filter, however it seems natural to me to be able to add metadata beyond the filename. In the same why, one can argue why even bother with task definition at all, you could just write down the list explicitly. It just is another way of concisely expressing what you want, and it is about striking a balance between conciseness and usability. I personally find it simpler to just staple ci-x and ci-y tags onto the thing I want to include in my CI than modifying multiple .set files.
(for what its worth, in my concrete case it is even quite useful to derive this metadata programmatically from a script, but that is indeed specialized.)
This is something I don't understand. What are the tool and model levels?
If we go forward, one question I would have about the design of this is "Why attach tags to properties? Why not to the tasks?".
In the end, this is what I was hinting at. With "tool" and "model" level I meant that a tool and model could also define tags. Or even the task. This together would give a set of tags for each run, on which one then could filter.
I guess for now one can leave this open. I can solve this by generating set and task-definition files from a domain-specific description. Maybe in the future more people have requests in this direction.