java-smt
java-smt copied to clipboard
Escaping of non SMT-LIB compliant names?
With SMTInterpol, the following code fails:
bmgr.makeVariable("select");
Exception:
de.uni_freiburg.informatik.ultimate.logic.SMTLIBException: Function select is already defined.
at de.uni_freiburg.informatik.ultimate.logic.NoopScript.declareFun(NoopScript.java:120)
at de.uni_freiburg.informatik.ultimate.smtinterpol.smtlib2.SMTInterpol.declareFun(SMTInterpol.java:1885)
at org.sosy_lab.solver.smtinterpol.SmtInterpolEnvironment.declareFun(SmtInterpolEnvironment.java:304)
at org.sosy_lab.solver.smtinterpol.SmtInterpolFormulaCreator.makeVariable(SmtInterpolFormulaCreator.java:84)
at org.sosy_lab.solver.smtinterpol.SmtInterpolBooleanFormulaManager.makeVariableImpl(SmtInterpolBooleanFormulaManager.java:55)
at org.sosy_lab.solver.smtinterpol.SmtInterpolBooleanFormulaManager.makeVariableImpl(SmtInterpolBooleanFormulaManager.java:1)
at org.sosy_lab.solver.basicimpl.AbstractBooleanFormulaManager.makeVariable(AbstractBooleanFormulaManager.java:69)
at org.sosy_lab.solver.test.SolverBasicTest.testArrayFunctions(SolverBasicTest.java:143)
The reason is that select
is already an existing interpreted function in SMTInterpol.
I think that JavaSMT should transparently escape such reserved names for variables and functions by quoting them. Probably select
is not the only case where this needs to be done.
The same problem appears for "true" and "false", if they are created as integer variables. SMTInterpol cannot create variables with such names. It might also happen for other reserved tokens.
SMTlib allows to escape names with |name|
. Perhaps we should always apply this (transparently) to variables for SMTInterpol. The problem might then be moved to cases where parsing or dumping of formulas is needed.
Let's slow down a bit here, why do we have a variable with such a name in a first place? I saw the corresponding ticket in CPAchecker, but don't we escape names with pipes in ctoformula package? I'm somewhat opposed to doing opaque escaping.
SMTlib allows to escape names with |name|. Perhaps we should always apply this (transparently) to variables for SMTInterpol.
If we do that, the formulas in SMTInterpol will look different to formulas in any other solver, which would make transferring formulas between the solvers rather unpleasant.
Of course, we can still escape names across all solvers, for both variables and UFs.
What's the current state for this?
I think the most important point is that declaring a variable like select
, store
, etc. either fails consistently with all solvers or is supported for all solvers.
If we let it fail, this would need to be documented, and we should probably add utilities for escaping such that not all client needs to reimplement this over and over (e.g., add something like String escapeNameIfNecessary(String)
).
According to SMT-LIB documentation each variable called "x" can be referred interchangeably as either "x" or "|x|".
We should test if that's indeed the case for all solvers and if that's the case, we can uniformly escape everything.
On Apr 4, 2016 12:49, "Philipp Wendler" [email protected] wrote:
What's the current state for this?
I think the most important point is that declaring a variable like select, store, etc. either fails consistently with all solvers or is supported for all solvers. If we let it fail, this would need to be documented, and we should probably add utilities for escaping such that not all client needs to reimplement this over and over (e.g., add something like String escapeNameIfNecessary(String)).
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/26#issuecomment-205238534
According to SMT-LIB documentation each variable called "x" can be referred interchangeably
Actually that turned out to be wrong: all solvers treat "x" and "|x|" to be different.
and we should probably add utilities for escaping
But solvers already have these utilities! And they use them! Just not in this case.
I'm tempted to say that's just a bug in SMTInterpol, as all other solvers seem to be able to handle it.
@PhilippWendler Yes, we can escape all variables. I do not like it for many reasons:
- Introspection, serialization, and model assignments will return values different to what user has entered.
- We can try to wrap/unwrap everything, but that would be a lot of code to cater for one bug in one solver, and this code has potential to be buggy as well (I'm not even sure how to un-apply escaping to the serialization output, that would mean SMT-LIB parsing, which we currently don't do.)
- For visitors wrapping/unwrapping would imply a very heavy performance penalty due to all the string concatenation. And I'm not just theorizing here: for LPI on many benchmarks introspection takes like 70% of the runtime.
Hence I suggest a following course of actions: I'll file a ticket with SMTInterpol developers, and for select/store/etc I would suggest escaping on CPAchecker level.
Of course we can also have a workaround which would work now: apply escaping only for SMTInterpol, and only when declaring the function fails (catch exception and retry)
I am not exactly sure what your proposal is, the two parts of the proposal seem to contradict each other.
Anyway, I can only refer to my comment above (from April 4th), this summarizes my opinion. The only thing I would add is that if some user-visible escaping is applied anywhere, this would also need to be documented and utilities for unescaping should be provided.
@PhilippWendler My proposal is to close this ticket, and to apply escaping in CPAchecker. I don't want to add a utility function for escaping variables, since it gives the impression that they need to be escaped, but this is not the case, since all other solvers escape them already.
As to documenting, I'm not sure where it should go, any suggestions? All solvers have many bugs, are we supposed to document all we know?
If one solver needs escaping and this is not handled by JavaSMT, then JavaSMT's API does need escaping. One should not code against JavaSMT's API with only a specific solver in mind.
If using JavaSMT's API needs escaping, then of course escaping utilities should be added to JavaSMT. It does not make sense at all to let every user of the API re-implement escaping.
However, I still do not understand why you propose escaping in CPAchecker when at the same time you say solvers should do the escaping.
And yes, all known problems with solvers should be documented if we cannot add workarounds. Documenting them once is much less effort than letting all users of JavaSMT rediscover them and spend effort debugging them. I suggest a section in the documentation.
@PhilippWendler If there was a solver that does not accept variable names starting with a letter "a", does it mean we should have wrapping just for this solver? How far should catering for particular needs go?
If we add an escape
function, it will be extremely counter-intuitive if it needs to be applied only when a certain solver is used. This also defeats the point of being able to transparently switch solvers. If we say that it always should be applied, then it's not clear why it's not applied automatically. If we always apply it automatically, then we get the problem that introspection, models and serialization do not return expected values.
To me we have multiple evil options, and crashing on certain variable names is currently the least evil solution.
If we add an escape function, it will be extremely counter-intuitive if it needs to be applied only when a certain solver is used. This also defeats the point of being able to transparently switch solvers. If we say that it always should be applied, then it's not clear why it's not applied automatically.
Exactly.
If we always apply it automatically, then we get the problem that introspection, models and serialization do not return expected values.
But you say we do have this problem already because other solvers do escaping internally. So why is this a problem for SMTInterpol specifically?
Why can't we add transparent unescaping for all solvers to solve the problem?
You claim a performance problem, but the same performance problem would occur if users of JavaSMT apply the (un)escaping themselves.
The most important criterion for the JavaSMT API in this case is IMHO that it should behave similarly for all solvers. Second criterion is to make it as easy to use as possible for users. Doing nothing satisfies neither of them.
You claim a performance problem, but the same performance problem would occur if users of JavaSMT apply the (un)escaping themselves.
Except they don't have to, if they use a different solver, or don't create variables called select
/store
, or escape them themselves, so they don't expect them to be equal on introspection.
Why can't we add transparent unescaping for all solvers to solve the problem?
- Code complexity: have to remember to synchronize it across visitors/serialization/models
- Performance.
- How do we even do transparent unescaping for SMT-LIB serialization?
You seriously suggest to add all these problems in order to have a workaround for a problem in SMTInterpol?
I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?
Why would users that escape themselves be affected if we escape select
etc.?
Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?
I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid). I am pretty sure that the serialization output of all other solvers also contains escaping for select
etc. if not even for all variables.
I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?
Performance of what specifically though? I was saying that adding "|" on every single visitor traversal is very costly, what does this have to do with adding pipes in SMTInterpol when creating the symbol? Our problem (visitor provides a pre-computed String) does not apply to them.
Why would users that escape themselves be affected if we escape select etc.?
I don't understand what this question is referring to.
Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?
I don't understand that question either, sorry.
I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid).
So our output with your suggestion will be escaped twice, once by us, and once by the solver, giving a different string than without escaping. This can be bad.
I still do not understand the performance argument. You just suggested to the SMTInterpol developers to add escaping, but you don't want to add the very same feature to JavaSMT because of performance? Why should this be slower in JavaSMT than in SMTInterpol?
Performance of what specifically though?
Performance of whatever you were claiming is a performance problem.
I was saying that adding "|" on every single visitor traversal is very costly, what does this have to do with adding pipes in SMTInterpol when creating the symbol? Our problem (visitor provides a pre-computed String) does not apply to them.
I cannot see any difference in code for the implementation of the escaping feature between doing it in JavaSMT and SMTInterpol itself. If you call makeVariable("select")
in a visitor, why does it make a performance difference if we add the pipes or SMTInterpol? All other solvers have this feature already and it is not a performance problem!
Why would users that escape themselves be affected if we escape select etc.?
I don't understand what this question is referring to.
You were claiming that we doing escaping is a performance problem for users that do escaping themselves. I ask why this should be the case.
Why should a workaround for SMTInterpol that makes it behave like the other solvers affect users of these other solvers?
I don't understand that question either, sorry.
You were claiming that we doing escaping is a performance problem for users that use other solvers than SMTInterpol. I ask why this should be the case.
I would not expect any form of unescaping for SMT-LIB serialization, because escaping is simply required for SMT-LIB serialization (otherwise it would be invalid).
So our output with your suggestion will be escaped twice, once by us, and once by the solver, giving a different string than without escaping. This can be bad.
Why should there be dual escaping? I do not understand this at all. Our problem is that SMTInterpol does not do escaping, so if we give it an escaped variable I certainly do not expect SMTInterpol to escape it again when writing SMT-LIB output. So no dual escaping.
-
I think I see where the misunderstanding about the visitors comes from. I was talking about the problems caused by _un_escaping code, since you were proposing to add transparent escaping. If we add transparent escaping, we would need to remove it for
visitFunction
to work properly, and thus even empty, do-nothing recursive visitor would become very expensive. -
Regarding other solvers, my understanding was that you propose to escape across all solvers? Or do you propose to only escape for SMTInterpol?
If we only escape for SMTInterpol, how should serialization to SMT-LIB work? Should we undo our escaping first?
Why should there be dual escaping?
IIRC, SMTInterpol does do escaping for SMT-LIB serialization. If we escape as well, everything will be escaped twice.
- I think I see where the misunderstanding about the visitors comes from. I was talking about the problems caused by unescaping code, since you were proposing to add transparent escaping. If we add transparent escaping, we would need to remove it for visitFunction to work properly, and thus even empty, do-nothing recursive visitor would become very expensive.
But why is this transparent unescaping more expensive if done by JavaSMT than if done by SMTInterpol? Why is the transparent unescaping a problem here if it is not a performance problem for all other solvers that already do it? If you really expect the unescaping for select
to be so expensive, you can even keep a constant map |select| -> select
and avoid any string allocations altogether (I think this is not necessary).
I simply propose to add the necessary adjustments to make SMTInterpol behave just like the other solvers. I do not see why we cannot do what all the other solvers successfully do.
- Regarding other solvers, my understanding was that you propose to escape across all solvers? Or do you propose to only escape for SMTInterpol?
Why should we do escaping for other solvers which manage to do it correctly themselves?
If we only escape for SMTInterpol, how should serialization to SMT-LIB work? Should we undo our escaping first?
No, why?
Why should there be dual escaping?
IIRC, SMTInterpol does do escaping for SMT-LIB serialization. If we escape as well, everything will be escaped twice.
If we give |select|
to SMTInterpol and SMTInterpol really produces ||select||
in some output, I would simply ask the developers to not do this (but I would be surprised if it really did this).
OK maybe I have misunderstood initially. IF we escape only for select/true/store/etc and only for SMTInterpol, and SMTInterpol does not escape already escaped variables for serialization, this is doable.
@PhilippWendler The situation with pipe symbols in SMTInterpol is even more funny: it happily accepts them, treat the identifiers as different, and then crashes on printing models or context with an assertion error. Is it something we are supposed to fix as well in the wrapper? If not, than how is it different to crashing on "select"?
I would simply file these crashes as bugs in SMTInterpol.
@PhilippWendler It's not that simple, it's done on purpose. Pipe symbols should have no meaning in SMT-LIB syntax (|x| == x) They have meaning internally when adding such variables via API to other solvers --- but they get escaped when the SMT-LIB serialization is done (e.g. serializing a variable |x| to SMT-LIB in Z3 gives |x|).
SMTInterpol decides to not accept such variables altogether, but does so in an inconsistent way, where the check is done only when performing the serialization, to SMT-LIB or to the model. I have filed this inconsistency as a bug, but the assertion would remain in any case.
So what's your proposal now? We can still escape select/store/etc variables for SMTInterpol with some other symbol, e.g. with triangular brackets. That adds an extra layer of abstraction, and does not achieve a goal of being able to consistently create same variables across all solvers, as pipe symbols would still behave in an inconsistent way.
Overall, trying to enforce consistency between different solvers gets us into a can of worms, and I'm not sure that this should be a goal of JavaSMT.
I would go with escaping with triangular brackets or !
or something similar.
If pipes are supported inconsistently and we do not want to or cannot fix this, then maybe we should forbid them completely?
The definition of JavaSMT is to provide a common API for a range of solvers, and an API does not only specify method signatures but also how the methods behave. So of course enforcing consistency is for me one of the main goals of JavaSMT. Users of JavaSMT should not have to know and follow a lot of solver-specific rules when using JavaSMT.
@PhilippWendler Well OK great, how do we distinguish between our <select>
and user-created <select>
?
One of the goals of JavaSMT is to remain simple. Trying to enforce consistency across all behaviors introduces a lot of complexity, with very questionable benefit.
If we forbid variable names matching |.*|
because there is no way to sanely handle them across all solvers, than we could also forbid variable names that match our own escaping pattern. The alternative is of course to simply escape all variables that already appear escaped: user-created <foo>
becomes <<foo>>
.
All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it. The burden will be just be on each individual user of JavaSMT, and thus be much greater in the end. Everything that JavaSMT can do to improve consistency between solvers will have a very large benefit because otherwise the problem needs to be solved again and again by users of JavaSMT.
On Fri, May 20, 2016 at 5:54 PM, Philipp Wendler [email protected] wrote:
If we forbid variable names matching |.*| because there is no way to sanely handle them across all solvers, than we could also forbid variable names that match our own escaping pattern.
What if someone has an application that already generates formulas with triangular brackets, so they won't be able to use JavaSMT because of this.
The alternative is of course to simply escape all variables that already appear escaped: user-created
becomes < >. OK so then we have the problem of all serialized formulas looking differently.
All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it.
I actually think it would. If someone encounters a crash on a variable called "select" I think it would be relatively trivial for them to fix it themselves.
The burden will be just be on each individual user of JavaSMT, and thus be much greater in the end. Everything that JavaSMT can do to improve consistency between solvers will have a very large benefit because otherwise the problem needs to be solved again and again by users of JavaSMT.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/sosy-lab/java-smt/issues/26#issuecomment-220644870
Sorry I'm closing it, as I currently do not see a way to do this without creating more problems than we had before, and without introducing too much extra complexity. If there's a crash on certain variable name, it's a simple, easy, crash, which is easy to debug. If there's a crash due to some trying-to-be-too-smart logic in JavaSMT, it's much worse.
The alternative is of course to simply escape all variables that already appear escaped: user-created
becomes < >. OK so then we have the problem of all serialized formulas looking differently.
But we have that problem already now! It is not an argument against doing our own escaping. If you want to fix this problem, then you need to add our own escaping that is used for all solvers.
All this complexity of variable-name handling won't go away if JavaSMT refuses to handle it. I actually think it would. If someone encounters a crash on a variable called "select" I think it would be relatively trivial for them to fix it themselves.
Why should this be the case? It is not trivial to fix this problem in CPAchecker. In fact, the easiest place to fix this is in JavaSMT, because in JavaSMT would be the smallest amount of code that needs to be touched. The code that uses JavaSMT is much larger in size, and thus it is more difficult to handle it there and not forget a case.
Should our API allow to create non-SMT-LIB compliant function names?
Current status: Only SMTInterpol disallows special characters and reserved keywords, and throws an exception at some point (inconsistently), i.e. during creation or during printing the name. All other solvers happily accept any String as function name, even if colliding with a reserved keyword or operators. A long as the formula is not dumped (as SMT-LIB) and needs parsing, everything works fine.
Proposal 1:
If we do cannot allow arbitrary names in all used solvers, we have to refuse all names that contain non-ascii-chars, "|"
, "\"
,... . The overhead for this would be minimal, as we can just check each name when creating a new function. The keywords should be easy to find in the SMT-LIB standard.
In this case we have to guarantee that both mkVariable("x")
and mkVariable("|x|")
return the same result and that the names "select"
, "<"
, "+"
, "["
, "("
, "|"
, " "
, "\n"
, "\t"
, ","
, "."
, "true"
, "and"
, "define-fun"
, ... throw an exception.
The benefit of this approach is a SMT-LIB compliant interface for all solvers, including the possibility to dump and parse all formulas.
Proposal 2:
Another possibility is to report this behaviour as a bug for SMTInterpol or write a wrapper especially for SMTInterpol, which replaces invalid chars and reserved keywords with some other tokens. Then we might lose SMT-LIB compliance for "dumped" formulas, but get a nicer interface for the user, because every String (including reserved keywords and also special chars like brackets, unicode-chars,...) is just a "String" and can be used as name without the fear of interfering with any solver-internals. While it might be obvious for a user to not use "+"
or "="
as function name (this is possible in all solvers except SMTInterpol), denying unicode characters might feel outdated.
With this approach, dumping a formula might not result in valid SMT-LIB in some cases and the dump might be unusable for further usage, but as long as the user just uses our API and does not dump or exchange data between different solvers, it should work fine.
I am in favour of the second proposal.