ProvingGround icon indicating copy to clipboard operation
ProvingGround copied to clipboard

Deep learning: why, how

Open siddhartha-gadgil opened this issue 6 years ago • 7 comments

Optimization vs deep learning

  • We can optimize generation when this is in terms of known elements and equations.
  • We can extend this by smoothing goals, for example.
  • However, we can try to train for deep learning:
    • directly from goals
    • possibly relative to givens, for which we can plug in discovered lemmas.

siddhartha-gadgil avatar Jul 16 '19 04:07 siddhartha-gadgil

One may also want deep learning just for speed, as (smoothing and) optimization can take long.

siddhartha-gadgil avatar Jul 16 '19 04:07 siddhartha-gadgil

Why representation learning

  • A form of representation learning is just passing through a low dimenstional layer, perhaps straightaway.
  • However, the point of explicit representation learning is using synthetic problems, which are easier to set up with "correct" answers.
  • In our situation, since we want embeddings of distributions our representations may not be a single embedding layer.

siddhartha-gadgil avatar May 07 '20 11:05 siddhartha-gadgil

Transformer architecture, position data

Many useful ideas are in the BERT work and stuff building on this. Adapting this to our siuation:

  • We can use position vectors rather than linearizing our generation trees.
  • We are trying to predict the terms in the RHS of equations, which may be either nodes or built up by substitution in nodes.
  • In the longer run we learn to use attention, based on both positions and actual terms.
  • Using terms: A crucial difference in our case is our model should allow the use of a term from the statement as is, without necessarily recognizing it.

Base level: Premise selection

  • For us a premise is a term on the right hand side of an equation, perhaps obtained after substitution (resolving).
  • Since we resolve to different depths, we have premises at different depths.
  • Similarly, the ingredients of the statement are terms and types at different depths.
  • Crucially, we can use ingredients of the statement in the proof as they are, so our model includes probabilities for this.
  • Further we predict not just individual premises, but groups that go together.

Refined model

  • Data has positions for terms, which are used
  • Use a multi-head attention.
  • Output should have not just premises but predicted positions.

siddhartha-gadgil avatar Sep 09 '20 09:09 siddhartha-gadgil

  • It is better not to view the output as translation, with predicted position as well.
  • Instead we view it as similar to a game, with outcome (score) and value; moves are the policy.
  • A basic score is obtained by comparison between weights of terms and types; if we have formal equations everything has a positive weight.

siddhartha-gadgil avatar Sep 14 '20 10:09 siddhartha-gadgil

  • Unlike a game, we can make multiple moves - indeed the Bot framework is exactly for this.
  • To take advantage of this, one should have multiple rewards, for example rewarding both
    • gathering: proving a lot of results.
    • hunting: proving hard results.
  • This may, for example, lead to lemmas chosen based on non-triviality versus just simple statements.
  • We have a further choice, for example in our base versus tangent split as well as in base-mixin - some terms are used to genearte recursively while other terms are used only in combination with those generated from the base.

siddhartha-gadgil avatar Sep 15 '20 05:09 siddhartha-gadgil

  • Outcome: We can measure outcomes by efficiency of proving - the (weighted) fraction of types that are proved/disproved for a given number of terms (with various cutoffs).
  • Moves may not lead to immediate improvement in outcome, but we can follow a few moves ahead.
  • Eventually we can have policy and value networks; to start with we carry out all moves but with resources allocated based on weights, and look directly at outcomes.
  • Layered outcomes: We can give extra credit for types not proved/disproved with an earlier collection of terms.

siddhartha-gadgil avatar Nov 05 '20 04:11 siddhartha-gadgil

Predicting outcomes, not moves

  • Rather than trying to learn good moves directly, we should try to predict outcomes of our moves, then search for the best moves.
  • This is similar to games, where we learn the value of a position, which is the expected outcome.
  • We should also learn to predict other auxiliary quantities.
  • In games we do predict moves to some extent, which gives our policy. This is used in non-monotonic searches. We may need some similar learning here too.

siddhartha-gadgil avatar Nov 18 '20 03:11 siddhartha-gadgil