ProvingGround Deep learning: why, how

Optimization vs deep learning

We can optimize generation when this is in terms of known elements and equations.
We can extend this by smoothing goals, for example.
However, we can try to train for deep learning:
- directly from goals
- possibly relative to givens, for which we can plug in discovered lemmas.

Jul 16 '19 04:07 siddhartha-gadgil

One may also want deep learning just for speed, as (smoothing and) optimization can take long.

Jul 16 '19 04:07 siddhartha-gadgil

A form of representation learning is just passing through a low dimenstional layer, perhaps straightaway.
However, the point of explicit representation learning is using synthetic problems, which are easier to set up with "correct" answers.
In our situation, since we want embeddings of distributions our representations may not be a single embedding layer.

May 07 '20 11:05 siddhartha-gadgil

Many useful ideas are in the BERT work and stuff building on this. Adapting this to our siuation:

We can use position vectors rather than linearizing our generation trees.
We are trying to predict the terms in the RHS of equations, which may be either nodes or built up by substitution in nodes.
In the longer run we learn to use attention, based on both positions and actual terms.
Using terms: A crucial difference in our case is our model should allow the use of a term from the statement as is, without necessarily recognizing it.

For us a premise is a term on the right hand side of an equation, perhaps obtained after substitution (resolving).
Since we resolve to different depths, we have premises at different depths.
Similarly, the ingredients of the statement are terms and types at different depths.
Crucially, we can use ingredients of the statement in the proof as they are, so our model includes probabilities for this.
Further we predict not just individual premises, but groups that go together.

Sep 09 '20 09:09 siddhartha-gadgil

It is better not to view the output as translation, with predicted position as well.
Instead we view it as similar to a game, with outcome (score) and value; moves are the policy.
A basic score is obtained by comparison between weights of terms and types; if we have formal equations everything has a positive weight.

Sep 14 '20 10:09 siddhartha-gadgil

Unlike a game, we can make multiple moves - indeed the Bot framework is exactly for this.
To take advantage of this, one should have multiple rewards, for example rewarding both
- gathering: proving a lot of results.
- hunting: proving hard results.
This may, for example, lead to lemmas chosen based on non-triviality versus just simple statements.
We have a further choice, for example in our base versus tangent split as well as in base-mixin - some terms are used to genearte recursively while other terms are used only in combination with those generated from the base.

Sep 15 '20 05:09 siddhartha-gadgil

Outcome: We can measure outcomes by efficiency of proving - the (weighted) fraction of types that are proved/disproved for a given number of terms (with various cutoffs).
Moves may not lead to immediate improvement in outcome, but we can follow a few moves ahead.
Eventually we can have policy and value networks; to start with we carry out all moves but with resources allocated based on weights, and look directly at outcomes.
Layered outcomes: We can give extra credit for types not proved/disproved with an earlier collection of terms.

Nov 05 '20 04:11 siddhartha-gadgil

Rather than trying to learn good moves directly, we should try to predict outcomes of our moves, then search for the best moves.
This is similar to games, where we learn the value of a position, which is the expected outcome.
We should also learn to predict other auxiliary quantities.
In games we do predict moves to some extent, which gives our policy. This is used in non-monotonic searches. We may need some similar learning here too.

Nov 18 '20 03:11 siddhartha-gadgil