java-smt icon indicating copy to clipboard operation
java-smt copied to clipboard

Blacklist more SMTLIB keywords as variable/uf names and automatically escape all variable names that would be illegal

Open daniel-raffler opened this issue 11 months ago • 2 comments

Hello, this will update the list of forbidden variable/uf names to include all SMTLIB commands, such as check-sat or assert, and all predefined function names, like mod or fp.add. While such symbol names are fine when using the API, they cause issues when printing the formulas as most solvers will fail to add the necessary quotes to the symbols. This will then causing issues in FormulaManager.translateFrom() as the generated output is invalid and can't to be read back by the target solver to translate the formula.

By forbidding such symbol names entirely we circumvent the issues. User should always use FormulaManager.isValidName() to check if their name is on the list, and substitute it with FormulaManager.escape() if necessary.

daniel-raffler avatar Jan 01 '25 16:01 daniel-raffler

This change disallows quite a few names for symbols. Are we sure that we want to go in that direction?

What is the motivation for this PR? Parsing and writing SMTLIB is complex, and solvers need to escape all unexpeted names anyway. This PR seems to forbid such cases upfront. This might be valid, but it might be unfriendly for existing callers.

This was motivated by an issue in CPAchecker (Bitwuzla issues when using a distinct solver for interpolation) were we try to use Bitwuzla as the main solver for CPAchecker. Since interpolation is not supported in Bitwuzla, we need to translate formulas from/to MathSAT whenever CPAchecker wants to calculate interpolants. For this we need to first dump the formulas to SMTLIB and then read them back in with the "target" solver. However, this will often fail with a parse error:

Exception in thread "main" java.lang.IllegalArgumentException: <string>:48:14: expected symbol after 'declare-fun'
	at org.sosy_lab.java_smt.solvers.bitwuzla.api.BitwuzlaNativeJNI.Parser_parse__SWIG_0(Native Method)
	at org.sosy_lab.java_smt.solvers.bitwuzla.api.Parser.parse(Unknown Source)
	at org.sosy_lab.java_smt.solvers.bitwuzla.BitwuzlaFormulaManager.parse(BitwuzlaFormulaManager.java:131)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.SeparateInterpolatingProverEnvironment.convertToMain(SeparateInterpolatingProverEnvironment.java:116)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.SeparateInterpolatingProverEnvironment.getInterpolant(SeparateInterpolatingProverEnvironment.java:92)
	at org.sosy_lab.cpachecker.util.predicates.smt.InterpolatingProverEnvironmentView.getInterpolant(InterpolatingProverEnvironmentView.java:32)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.ITPStrategy.getInterpolantFromSublist(ITPStrategy.java:166)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.SequentialInterpolation.getFwdInterpolants(SequentialInterpolation.java:157)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.SequentialInterpolation.getInterpolants(SequentialInterpolation.java:110)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.getInterpolants(InterpolationManager.java:783)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager$Interpolator.buildCounterexampleTrace(InterpolationManager.java:996)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.buildCounterexampleTrace0(InterpolationManager.java:449)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.lambda$buildCounterexampleTrace$0(InterpolationManager.java:328)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.callWithTimelimit(InterpolationManager.java:337)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.buildCounterexampleTrace(InterpolationManager.java:327)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.performInterpolatingRefinement(PredicateCPARefiner.java:400)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.checkCounterexample(PredicateCPARefiner.java:390)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.performRefinementForPath(PredicateCPARefiner.java:281)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateStaticRefiner.performRefinementForPath(PredicateStaticRefiner.java:183)
	at org.sosy_lab.cpachecker.cpa.arg.AbstractARGBasedRefiner.performRefinementForPath(AbstractARGBasedRefiner.java:158)
	at org.sosy_lab.cpachecker.cpa.arg.AbstractARGBasedRefiner.performRefinement(AbstractARGBasedRefiner.java:104)
	at org.sosy_lab.cpachecker.core.algorithm.CEGARAlgorithm.refine(CEGARAlgorithm.java:310)
	at org.sosy_lab.cpachecker.core.algorithm.CEGARAlgorithm.run(CEGARAlgorithm.java:256)
	at org.sosy_lab.cpachecker.core.algorithm.ParallelAlgorithm.runParallelAnalysis(ParallelAlgorithm.java:390)
	at org.sosy_lab.cpachecker.core.algorithm.ParallelAlgorithm.lambda$createParallelAnalysis$4(ParallelAlgorithm.java:317)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

The issue is that many of the programs use the C function exit() and CPAchecker will model this with a UF that has the same name. When the formula is then printed we get output that looks like this:

(declare-fun exit ((_ BitVec 32) ) (_ BitVec 8))

Since exit is already a SMTLIB command, this is not valid input and that's what leads to the parse error. From what I can see this mostly seems to happen for exit and true. The latter is already a reserved keyword in AbstractFormulaManager and the easiest solution would be to just add exit to the list. However, I'm somewhat concerned that we will eventually run into the same problem for other keywords, and that's why I've tried to include as many of them as possible.

As @PhilippWendler pointed out there is already an old JavaSMT issues here, but it's never been entirely resolved. The proper fix would be to escape such names transparently, but this would require us to write our own parser and printer for SMTLIB formulas, which still seem out of reach. Blacklisting the reserved symbols for now at least makes sure that the user gets an error immediately. Once the parser/printer is ready we can then allow them again.

Note that I added some tests in supported_symbols_test to see what symbol names are actually allowed by the solvers. You can run ParserSymbolsTest.dumpTest() to see the results. If the test fails for the assertion, it means that the (quoted) symbol can be parsed, but the solver fails to add quotes when printing it. If there is a different exception, it means that the solver won't parse the symbol at all, even when properly quoted.

From the tests it seems that most solvers don't quote reserved words when printing formulas and then fail to read back their own output. It's even worse for Bitwuzla, which fails to parse any reserved word, even when quotes are used. This means that just fixing formula printing won't solve the issue either.

daniel-raffler avatar Jan 02 '25 13:01 daniel-raffler

I've added transparent escaping/unescapig with a43cb6dc2528b851afdb0d2d759f3393dff46a3a. This means that all variables are now allowed, and the escaping is done automatically when needed.

We still have to figure out what to do about all the tests that expect makeVariable to fail for certain illegal names. And there are couple of parsing bugs in the solvers that should be reported.

daniel-raffler avatar Jan 05 '25 17:01 daniel-raffler

I've added transparent escaping/unescapig with a43cb6d. This means that all variables are now allowed, and the escaping is done automatically when needed.

We still have to figure out what to do about all the tests that expect makeVariable to fail for certain illegal names. And there are couple of parsing bugs in the solvers that should be reported.

Well we have to update the tests AND we have to test this new feature.

baierd avatar Jul 21 '25 14:07 baierd