adiar icon indicating copy to clipboard operation
adiar copied to clipboard

Trusted Binary Decision Diagrams (Proof Generation)

Open SSoelvsten opened this issue 1 year ago • 2 comments

A problem in SAT solving is trusting the solver when it claims a formula to be unsatisfiable. To this end, the solver can create an LRAT (or just a DRAT) proof. But, doing so is not a trivial task. Yet, BDDs naturally give rise to these very proofs, as is shown in [TACAS 21, arXiv 21] . The underlying idea is to store within each node the associated extension variable u and the four proof steps that encode the formula 'u <-> ITE(x,u1,u0)'.

This was added to a fork of the BuDDy BDD library, called TBuDDy. We want to add the same features to Adiar. The files tbdd.h and tbdd.cpp of TBuDDy might be useful reference material.

API

Proof Output

In TBuDDy, the user specifies the desired proof format and output file as part of a tbdd_init function (which also initialises BuDDy itself). We can do the same. But, I would prefer if we somehow could support multiple output files at once (thinking of later concurrent/thread-safe usages of Adiar). For thread-safety we might want to take a note or two from the exec_policy design (#548)

  • [x] Add tbdd_ostream class to handle outputting the proof steps (LRAT or DRAT), create fresh variables, and create the clause identifiers.
    • [ ] Each basic TBDD creation (tbdd_clause and tbdd_validate below) has the output stream as its last argument (default: global proof_writer).
    • [ ] For thread-safety the tbdd_ostream in question needs to have a write-lock (using RAII).

TBDD Operations

  • [ ] The TBDD u for the input CNF clause with the given id.

    tbdd tbdd_clause(const generator<int> &xs, const size_t clause_id);
    

    This is similar to bdd_or ( with #556 ) and adds the associated proof of Cid |= u. In TBuDDy this is the tbdd_create(bdd f, clause_id i) function. That is, we may consider to just use bdd_and(xs) and then convert it into a tbdd afterwards.

    This may need an (optional?) last argument for a shared proof_writer.

  • [ ] The TBDD w for u∧v with the associated proof that u∧v → w.

    tbdd tbdd_and(tbdd u, tbdd v);
    

    Also add & and &= overloads

  • [ ] Upgrades f to a TBDD u' by proving that u → u'.

    tbdd tbdd_validate(tbdd u, bdd f);
    

    This may need an (optional?) last argument for a shared proof_writer.

TBDD Information

Next to tbdd_clause(xs, id) to create a clause, we may also want the following basic building blocks:

  • [ ] The tautological clause (reuse bdd_top)

    bool tbdd_top();
    

    This may need an (optional?) last argument for a shared proof_writer.

We should also bring over the following predicates and accessors from bdd ( together with #536 ).

  • [ ] Whether a TBDD is a tautology (reuse bdd_istrue)

    bool tbdd_istop(tbdd u);
    
  • [ ] Whether a TBDD is a contradiction (reuse bdd_isfalse)

    bool tbdd_isbot(tbdd u);
    
  • [ ] Whether a TBDD is a single literal (reuse bdd_isvar)

    bool tbdd_isliteral(tbdd u);
    
  • [ ] Whether a TBDD is a positive literal (reuse bdd_isithvar)

    bool tbdd_ispositive(tbdd u);
    
  • [ ] Whether a TBDD is a negative literal (reuse bdd_isnithvar)

    bool tbdd_isnegative(tbdd u);
    

Finally, for debugging and statistics we also need

  • [ ] The number of BDD nodes in the TBDD (reuse bdd_nodecount)

    size_t tbdd_nodecount(tbdd u);
    
  • [ ] The number of Literals (i.e. the number of levels) present in the TBDD (reuse bdd_varcount)

    size_t tbdd_varcount(tbdd u);
    
  • [ ] Print the underlying BDD to a file (reuse bdd_printdot but maybe the output should also include the proof variables? If so, then this may not be trivial)

    void tbdd_printdot(tbdd u, std::ostream);
    

Other TBDD Operations (not mandatory)

In [CADE 2021], more operations are described. Here, they use the proof system to both proof a formula to be true and to be false (since that is necessary for QBF problems). What we have noted above for tbdd_and(u, v) is only the truth preserving version. See [CADE 21] for more details.

  • [ ] The TBDD w for u∨v with the associated (thruth preserving) proof that u → w and v → w.

    tbdd tbdd_or(tbdd u, tbdd v);
    

    Also add | and |= overloads

  • [ ] The TBDD w for u∧l where l is a single literal. This creates the associated (truth preserving) proof of w∧l → u.

    tbdd tbdd_restrict(tbdd u, int l);
    
  • [ ] The TBDD w for ∃x : u.

     tbdd tbdd_exists(tbdd u, int x);
    

    In [CADE 21] this is just the combination of tbdd_restrict and tbdd_or. Yet, one should be able to (1) combine it into a single quantification sweep for each literal and then maybe (2) incorporate multiple literals at once.

  • [ ] The TBDD w for ∀x : u.

    tbdd tbdd_forall(tbdd u, int x);
    

    Use tbdd_restrict and tbdd_and together. Similar to tbdd_exists, future work would be to improve performance with a dedicated quantification algorithm.

tbdd_and (Manipulation Algorithms)

I'll focus on tbdd_and for now...

Since the proof steps are built bottom-up most of the work happens in the Reduce Sweep - the preceding top-down sweep only needs to be extended such that it also stores the relevant association between nodes and proof variables.

Top-Down Sweeps

During the top-down sweep, forward a mapping between the nodes of the unreduced intermediate output and the original nodes (and/or their variables).

Algorithmic Extension

  1. Make the prod2_policy a stateful policy (similar to nested_sweep).
  2. Add hooks to prod2:
    • Start processing of pair (u,v). Here, the policy knows how far to go forward in its list of proof clauses.
    • Forwarding (u,v) across a level. Here, the policy knows to forward proof clauses too in its own priority queue. Yet, unlike the product construction it only needs to forward information for the node and not all its ingoing arcs. Hence, the maximum size of this priority queue is the product of the inputs' widths.
    • Resolved (u,v) to w. Here, the policy can output all of the relevant proof clauses for later.

Reduce Sweep

Preliminaries

The end-goal is to describe each BDD node u = (x, u[0], u[1]) as the following four clauses that collectively encode u↔(x ? u[1] : u[0]).

Notation Clause
HD(u) -x -u u[1]
LD(u) x -u u[0]
HU(u) -x -u[1] u
LU(u) x -u[0] u

Here, the L and H respectively are abbreviations for Low and High edges. The U is Up and D is Down (with respect to the direction of the implication). Notice, this perfectly fits with the information flow of the Reduce algorithm.

After having defined the BDD node in the proof, we also want to prove that it is the correct result. To this end, we need to combine the defining clauses of original BDD nodes u, v, and the resulting BDD node w.

BDD Node Clauses
u HD(u), LD(u)
v HD(v), LD(v)
w HU(w), LU(w)

Next to these six clauses we also need the following clauses that claim they correspond to the and of u and v.

Notation Clause
ANDH -u[1] -v[1] w[1]
ANDL -u[0] -v[0] w[0]

Knowing the clause id of all eight of these, we can combine them into two RUP resolution chains that can be combined to proof -u -v w, i.e. u∧v→w.

Algorithmic Extension

Again, we should be able to implement the above by the addition of hooks to a stateful reduce_policy. Alternatively, if this becomes too messy/expensive, then we may want to just implement a separate tbdd_reduce. The four places the algorithm needs to be extended is:

  • Node w is suppressed: Store this down together, similar to the red1_mapping.

  • Outputting new node w as w': Push LD(w'), LU(w'), HD(w'), and HU(w') to the proof and push their ids to the clause id file. Furthermore, push the clause id into a red2_mapping sorter that puts it back in order with w. Here, we

  • Outputting duplicate node w as w': Also place this mapping in the red2_mapping clause sorter.

With all of the defining clauses of w' available, we now only need to combine them with the input's clauses. Yet, to do so it needs to back in the order of the unreduced product graph. Hence, we need the clause variants of red1_mapping and red2_mapping. Finally, knowing all of the relevant clauses and having identified the proof case in question, we can output the proofs during the last hook:

  • Forwarding w' to parents of w Here, we need to provide the identifiers for the defining and correctness proving clauses of w'. These then need to be forwarded to their parents.

For all of this, the priority queue needs to be extended with w[_] and the clause proving their correctness (this is akin to #412).

tbdd_validate (Validation Algorithm)

This follows the above described idea for tbdd_and

  1. In a top-down sweep, create the clause mapping of the product graph between u and f (based on the implication operator).
  2. In the bottom-up sweep, instead of ANDL and ANDH create the relevant correctness proving clauses.

To make this faster, we would want the proof to relate to the BDD nodes of f rather than to new TBDD nodes of u' (since we then can reuse the node file for f). To do so, we should create a custom tbdd_reduce which does not output any new nodes and skips all of the computations to apply reduction rules. Instead, it traverses the transposed version of f, declares new variables for u', and proves it correct with the clauses of u.

  • Sort the products graph clauses of step 1. to match the transposition of f.
  • Forward relevant clauses bottom-up with a (levelized) priority queue.
  • For each node of f, combine the clauses from the priority queue to create new extension clauses for the nodes of f and use the product graph clauses to prove the implication.

Architecture

  • [ ] tbdd class

    • Conversion from tbdd to bdd should be trivial (since we may want to do some BDD operations, e.g. bdd_exists, and then get back to TBDDs with tbdd_validate). That is, it should merely be forgetting the associated variables. Hence, extension variables should not be included in the node or levelized_file<node> class but be placed in a separate file with the associated proof steps. In fact, we would want tbdd to inherit from the bdd class.
    • To keep the number of active clauses low, all freed TBDD nodes need to be deleted in the proof. Since Adiar does not share subtrees, then this merely is to make the tbdd destructor output a delete of all its extension variables. Hence, each tbdd needs to carry around a reference to their proof writer (assuming there is not only one).
      • Since deletions should not happen before the final result is computed, we do not need a __tbdd class (at least in the public API). To safe a little bit of disk space, we can also move the clause file(s) from the input into __tbdd.
      • The proof consists of (a) four RAT clauses that define each BDD node and (b) three RUP steps to prove correctness. We need the defining clauses of (a) later to prove the correctness of the next algorithm. The clauses of (b) need to be stored to delete them later.
  • [ ] tbdd_reduce

    Add the hooks and logic described above.

  • [ ] tbdd_and

    Add the hooks and logic described above.

  • [ ] tbdd_validate

    Implement tbdd_validate__down and tbdd_validate__up as described above.

References

  • [TACAS 21] Randal E. Bryant and Marijn J. H. Heule. “Generating extended resolution proofs with a BDD-based SAT solver”. In: Tools and Algorithms for the Construction and Analysis of Systems (2021)

  • [arXiv 21] Randal E. Bryant and Marijn J. H. Heule. “Generating extended resolution proofs with a BDD-based SAT solver (extended version)”. In: arXiv Preprint (2021)

  • [CADE 21] Randal E. Bryant and Marijn J. H. Heule. “Dual Proof Generation for Quantified Boolean Formulas with a BDD-based Solver”. In: International Conference on Automated Deduction (2021)

SSoelvsten avatar Dec 19 '22 12:12 SSoelvsten