gapt
gapt copied to clipboard
TPTP: THF and TFF parsing
- TPTP = problem and derivation files in TPTP syntax
- THF = typed higher-order formula (a formula type in TPTP)
- TFF = typed first-order formula (a formula type in TPTP)
At the moment, TptpParser
is limited to first-order formulas. However, to support e.g. many-sorted problems in the Vampire interface, we need to export and parse at least the TFF fragment of TPTP.
Regarding parsing, I see three possibilities:
- Create (mostly) first-order formulas during parsing, just as we do now. In a separate step, parse the type declarations, and replace the constants by constants of the correct type. Think about how to handle
![X1: $someType]: X1 = X1
andtff(f_type, type, f: $int > $int).
. - Keep track of the type declarations during parsing and construct the correct terms straight away. This probably requires mutable state in the parser. In addition, we would need to evaluate
include
directives during parsing (as they may contain type declarations, I believe). - Construct Babel AST terms; we could relatively faithfully parse type annotations, etc. this way. Then convert to
LambdaExpression
in a separate step. The main problem here is that the TPTP data structures such asAnnotatedFormula
contain LambdaExpressions, and I would prefer not to duplicate them.
I think we should only support the TF0 type system (monomorphic types), and not TF1 (polymorphic types). THF support should come pretty much for free, I think.
I always avoided mutable state because the parser may backtrack. I haven't found a good way to get around this yet.
[...] the parser may backtrack. I haven't found a good way to get around this yet.
Ah, that's easy: you just make a cut, which prevents backtracking. In parboiled2 that's ~!~
, in fastparse ~/
, and in the old scala-parser-combinators library you write ~!
.
Cutting reduces the expressiveness of the parser, are the grammars all in the non-backtracking fragment?
I'm not sure what you mean precisely. However in TPTP files there is a very nice point to cut, namely after each TPTP_input production, i.e. a dot-delimited top-level form:
fof(foo, axiom, ![X]: X=X).
I think that there is no need to backtrack after we've seen the dot at the end of the line. Fortunately enough, TPTP only supports one type declaration in each line. :-)
Imagine a rule ( a ~!~b ) | c - if I see it correctly, then c will not be tried when b fails, since that branch has been cut off. I can't tell if such a pattern occurs in the THF parser. In the prover9 code I could circumvent some disjunctions by converting them into regular expressions, but that is not always possible.
Ah, I see what you mean. Of course, just replacing sequential composition with cutting composition everywhere changes the language the grammar accepts. In the TPTP case I think we can get away by cutting after we've parsed the dot, something along the lines of the following:
def thf_type = rule {
"thf(" ~ atomic_word ~ ", type, " ~ type_declaration ~ ")." ~!~ "" ~>
(/* process and store type declaration */) }
I don't really know the pitfalls of this approach though.
I believe the cut is safe for whole thf( ... ).
, I think binders can also be realized (the bound variable is first parsed as a string and only converted to a Var when the type is known) but I don't know if there is some other place where the parser needs to update the type assignment table.