Examples icon indicating copy to clipboard operation
Examples copied to clipboard

Add proof checking time to manifest details

Open ahelwer opened this issue 1 year ago • 5 comments

Currently we manually exclude specific modules from proof checking in the CI because they run too long. Given that proof checking time can have as high of variability as model checking, it makes sense to include data about proof check time in the manifest then use that to determine whether to check the proof in the CI. Possibly we could have a CI workflow scheduled to run weekly which checks these long-running proofs along with the macOS workflow.

Failing proofs can continue to be manually excluded because ideally this repo would not have or accept any specs with failing proofs.

ahelwer avatar Mar 02 '24 13:03 ahelwer

I'm catching up on today's community call, where @muenchnerkindl mentioned that the proof-checking time for ByzPaxos would be too long for our GitHub CI. One potential solution could be to use Github Action caching for TLAPS fingerprints. This would allow TLAPS to only prove new or updated obligations, rather than rechecking all obligations each time. The cache ID could be derived from TLAPS's version that the CI is running with.

@IamfromSpace @ahelwer I know very little about nix. Could you expand how nix would solve the problem of proof time?

lemmy avatar Mar 11 '25 21:03 lemmy

Nix could be another possible caching mechanism. If it finds a prebuilt result in a binary cache, it will skip the build entirely and pull from cache.

The only other thing that struck me later is that if we have fully deterministic dependencies (which Nix could offer) and algorithms, it might hypothetically possible to know how many steps a proof takes to complete, regardless of time. But I’m not sure we have the ability to utilize such an approach.

IamfromSpace avatar Mar 11 '25 23:03 IamfromSpace

I'm not sure what you mean by "steps" here. TLA+ proofs explicitly list all user-level steps, but each of those "steps" relies on a check by the back-end provers, and we have no way of knowing how many steps they require internally. Moreover, many automatic provers use some form of randomization, so the number of internal steps may differ between runs.

muenchnerkindl avatar Mar 12 '25 07:03 muenchnerkindl

Back to the original question, note that TLAPM has the option --stretch that allows one to multiply all timeouts by some factor. That is a crude mechanism to account for slow hardware, not sure we want to use it for the CI.

muenchnerkindl avatar Mar 12 '25 07:03 muenchnerkindl

we have no way of knowing how many steps they require internally. Moreover, many automatic provers use some form of randomization, so the number of internal steps may differ between runs.

Right, this is the kind of “steps,” I was getting at, sorry, should have been more clear. Logical timeouts on deterministic executions rather than physical timeouts. And agreed, not sure that it’s anything we could ever leverage in practice.

IamfromSpace avatar Mar 12 '25 17:03 IamfromSpace