gapt TPTP: THF and TFF parsing

TPTP = problem and derivation files in TPTP syntax
THF = typed higher-order formula (a formula type in TPTP)
TFF = typed first-order formula (a formula type in TPTP)

At the moment, TptpParser is limited to first-order formulas. However, to support e.g. many-sorted problems in the Vampire interface, we need to export and parse at least the TFF fragment of TPTP.

Regarding parsing, I see three possibilities:

Create (mostly) first-order formulas during parsing, just as we do now. In a separate step, parse the type declarations, and replace the constants by constants of the correct type. Think about how to handle ![X1: $someType]: X1 = X1 and tff(f_type, type, f: $int > $int)..
Keep track of the type declarations during parsing and construct the correct terms straight away. This probably requires mutable state in the parser. In addition, we would need to evaluate include directives during parsing (as they may contain type declarations, I believe).
Construct Babel AST terms; we could relatively faithfully parse type annotations, etc. this way. Then convert to LambdaExpression in a separate step. The main problem here is that the TPTP data structures such as AnnotatedFormula contain LambdaExpressions, and I would prefer not to duplicate them.

I think we should only support the TF0 type system (monomorphic types), and not TF1 (polymorphic types). THF support should come pretty much for free, I think.

Jun 14 '16 08:06 gebner

I always avoided mutable state because the parser may backtrack. I haven't found a good way to get around this yet.

Jun 14 '16 11:06 quicquid

[...] the parser may backtrack. I haven't found a good way to get around this yet.

Ah, that's easy: you just make a cut, which prevents backtracking. In parboiled2 that's ~!~, in fastparse ~/, and in the old scala-parser-combinators library you write ~!.

Jun 14 '16 12:06 gebner

Cutting reduces the expressiveness of the parser, are the grammars all in the non-backtracking fragment?

Jun 14 '16 12:06 quicquid

I'm not sure what you mean precisely. However in TPTP files there is a very nice point to cut, namely after each TPTP_input production, i.e. a dot-delimited top-level form:

fof(foo, axiom, ![X]: X=X).

I think that there is no need to backtrack after we've seen the dot at the end of the line. Fortunately enough, TPTP only supports one type declaration in each line. :-)

Jun 14 '16 12:06 gebner

Imagine a rule ( a ~!~b ) | c - if I see it correctly, then c will not be tried when b fails, since that branch has been cut off. I can't tell if such a pattern occurs in the THF parser. In the prover9 code I could circumvent some disjunctions by converting them into regular expressions, but that is not always possible.

Jun 14 '16 12:06 quicquid

Ah, I see what you mean. Of course, just replacing sequential composition with cutting composition everywhere changes the language the grammar accepts. In the TPTP case I think we can get away by cutting after we've parsed the dot, something along the lines of the following:

def thf_type = rule {
  "thf(" ~ atomic_word ~ ", type, " ~ type_declaration ~ ")." ~!~ "" ~>
    (/* process and store type declaration */) }

I don't really know the pitfalls of this approach though.

Jun 14 '16 12:06 gebner

I believe the cut is safe for whole thf( ... )., I think binders can also be realized (the bound variable is first parsed as a string and only converted to a Var when the type is known) but I don't know if there is some other place where the parser needs to update the type assignment table.

Jun 14 '16 12:06 quicquid

gapt gapt copied to clipboard

TPTP: THF and TFF parsing

gapt
gapt copied to clipboard