leo icon indicating copy to clipboard operation
leo copied to clipboard

[Proposal] [Leo RFC] Avoid and detect errors via symbolic reasoning

Open acoglio opened this issue 4 years ago • 0 comments

💥 Proposal - Leo RFC

Summary

Extend the Leo compiler with symbolic reasoning capabilities (theorem proving, model checking, SMT/SAT solving, sound static analysis, etc.) to avoid and detect errors involving non-constant values.

This is a broad issue that requires further research, but it can be fairly easily experimented with by adding optional passes to the Leo compiler, without having to make any change to the language.

Motivation

Certain Leo operations are considered erroneous when applied to certain values. Examples are division by zero, out-of-bound array indexing, overflowing arithmetic (assuming that the consensus on this is confirmed), etc.

When these operations are applied to values that are known at compile time (via constant propagation and folding), it is easy to detect these errors statically, or to ensure that they do not happen.

However, there are cases in which the values depend on external inputs. In these cases, a traditional computational model could defensively check for the errors at run time; there may be some R1CS equivalent of that that the Leo compiler produces or could produce. Nonetheless, a Leo program that may exhibit these behaviors at run time should be arguably considered erroneous. The problem can be normally attributed to inadequate input validation (cf. https://cwe.mitre.org/data/definitions/20.html). The solution is to properly validate input data prior to, or in the course of, operating on it. (All of this is, of course, not specific to Leo, but applicable to any programming language.)

In the interest of assurance, we would like to ensure, statically (i.e. at compile time), that those error situations never occur, for every possible inputs to the program. The approach to achieve that is to perform symbolic reasoning, which consists in techniques like theorem proving, model checking, SMT/SAT solving, sound static analysis, etc. This is no silver bullet, because this brings in all the known difficulties with formal program verification; however, due to the relatively small size of Leo programs, and the restrictions such as bounded loops, this may be a fruitful target for symbolic reasoning.

Design

The design is fairly open. In a way, this RFC could be a bit of a "meta" RFC for more specific RFCs for particular symbolic reasoning techniques and tools.

In general, symbolic reasoning (of Leo or other programs) may return three answers, for each possible error at each point in the code:

  1. The error definitely never occurs. (Proof of correctness.)
  2. The error may definitely occur, for one or more specific inputs. (Counterexamples found.)
  3. The error may or may not occur. (The analysis was inconclusive.)

While in principle the problem is decidable due to Leo's finite field, bounded loops, etc., it is still "practically undecidable" due to the large size of possible computation states. So we expect the 3rd answer above to be possible in general.

But given the relatively small size of Leo programs, the requirement of bounded loops, etc., there is hope to obtain the 1st or 2nd answer often. This will at least rule out (after the user has fixed the code to avoid the 2nd answer) many errors from many points in the code, which will not be denied by the 3rd answer being given for other errors at other points in the code. Without any symbolic reasoning capability, we implicitly always get the 3rd answer for every error at every point in the code.

Leo users should not be program verification experts. Symbolic reasoning often succeeds automatically. Nonetheless, we may allow users to state certain precondition, postconditions, invariants, and assertions that help the symbolic reasoning. It is hoped that users who strive to write correct code will be willing and able to write at least some of these expected facts about their code.

The symbolic reasoning capabilities envisioned here are complementary to the formal proofs that we are generating for the Leo compilation process. This RFC concerns proofs about the behavior of the Leo code, while the compilation proofs ensure that such behavior is faithfully achieved in the R1CS circuit.

The symbolic reasoning capabilities envisioned by this RFC would provide Leo with much higher assurance than typical programming languages. It would be a rarely found capability.

Drawbacks

None really, in the sense that it is an extra capability that may be optionally used. However, if users do use it, they'll be facing, in the worst case, the formal program verification "wall".

Effect on Ecosystem

None?

Alternatives

No alternatives as such. We either do/try this or we do not.

acoglio avatar Feb 08 '21 03:02 acoglio