Dedukti icon indicating copy to clipboard operation
Dedukti copied to clipboard

Heavy generated object files

Open EmilieGrienenberger opened this issue 5 years ago • 20 comments

Generated object files become exceedingly heavy and long to load when working on a large library with many interdependencies and proof sharing. Ex : in the translation of the HOL light standard library, the .dko files become even heavier than the .dk files.

A solution would be to add a way of identifying objects which should not be added to the generated object file (ex : all shared proof steps which are not "main" theorems, and so are not reused later in the library) : maybe a keyword "local" ?

EmilieGrienenberger avatar Jun 24 '19 09:06 EmilieGrienenberger

What's the point of keeping a proof of a 'theorem' in memory once it is checked? The dko could simply store the type but not the proof.

Le 24/06/2019 à 11:31, EmilieGrienenberger a écrit :

Generated object files become exceedingly heavy and long to load when working on a large library with many interdependencies and proof sharing. Ex : in the translation of the HOL light standard library, the .dko files become even heavier than the .dk files.

A solution would be to add a way of identifying objects which should not be added to the generated object file (ex : all shared proof steps which are not "main" theorems, and so are not reused later in the library) : maybe a keyword "local" ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Deducteam/Dedukti/issues/177?email_source=notifications&email_token=ACSZRGPBILVLQOVKAMSF4HDP4CH63A5CNFSM4H24YEW2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3HZSGA, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSZRGNSX35QCAVA77ULAELP4CH63ANCNFSM4H24YEWQ.

fblanqui avatar Jun 24 '19 10:06 fblanqui

The issue is that proofs are shared between main theorems, so some intermediate steps of the proofs are defined as theorems by themselves. In a file containing the proofs of 10 main theorems, there may be 10000 of these intermediate theorems, which are never reused. They accumulate in the object files.

EmilieGrienenberger avatar Jun 24 '19 10:06 EmilieGrienenberger

What do you mean by "shared"? The 'theorem' keyword has been introduced in Dedukti to denote an opaque definition: its actual content is irrelevant and can be erased; only the type is useful.

fblanqui avatar Jun 24 '19 11:06 fblanqui

@EmilieGrienenberger if your object is "opaque", meaning that it is defined with the thm keyword, then its proof should not appear in the dko: see here.

In general, I am not surprised that a dko is larger than a dk, since the marshalization of the file requires also storing OCaml informations. In your case, with what you said, it is strange...

Can you check if the same behavior occurs with lambdapi?

francoisthire avatar Jun 24 '19 12:06 francoisthire

I'm afraid that, currently, LP stores proofs... @rlepigre ?

fblanqui avatar Jun 24 '19 13:06 fblanqui

@fblanqui: if the object is defined with theorem, then the proof term is not recorded after the qed (see here). Obviously, partial proofs are stored until the proof term is complete however.

rlepigre avatar Jun 24 '19 13:06 rlepigre

Good!

fblanqui avatar Jun 24 '19 13:06 fblanqui

Thank you for your quick answers. However, I already use the thm keyword. I will try to make my issue clearer with an example. The generated dk files may look like this:

thm thm_1 := ... thm thm_2 := ... ... thm thm_500 := ... thm addition1 := ... thm addition2 := ...

Here, the 500 first theorems are just used to prove the main theorems addition1 and addition2, and are neither important results from the library nor useful to the checking of other library files. However, both main theorems may use them many times each, and they may be used many times in each other, that is what I meant by proof sharing. As a consequence, defining them as separate results saves a lot of time and space, but they make the object files really heavy. As a consequence, I was wondering if there was a way to indicate in the dk files not to take them into account to generate the object files.

EmilieGrienenberger avatar Jun 24 '19 14:06 EmilieGrienenberger

If my understanding is correct, the current behavior is to export only the statement of thm_1, thm_2, ... But this behavior is not the one you want. Instead, since thm_1, thm_2, are internal to the current file, you don't need to export them at all since addition_1 is a thm and its proof is not exported.

In that case, this is a feature request that we should discuss. In particular, I would like to see the size of both the dk and dko files.

francoisthire avatar Jun 24 '19 15:06 francoisthire

Exactly, I would only need addition1 and addition2 to be recorded in the object file.

I have more than 300 files of generated proofs (total of a bit less than 2G), dko files are I would say 10% heavier than original dk files.

The last files in the library are dependent on a large number of previous files, which makes the loading time for dependencies extremely long. In the more extreme cases, when disabling the dependencies (declare axioms instead of directly using already proven theorems), a file checks in 10 seconds, and when enabling dependencies it checks in more than 20min (on my current computer).

I could for the moment do the checking in reasonable time on a more powerful computer, but this issue is only going to get worse when trying to translate and check additional parts of this library, and will maybe be encountered when translating other large libraries (with proof sharing at least).

EmilieGrienenberger avatar Jun 24 '19 16:06 EmilieGrienenberger

Couldn't you use local let-definitions to embed the intermediate result in the proof of the final results? This would probably not change much the size of the initial file if the intermediate results are not used across several theorems. And this would otherwise solve your problem.

rlepigre avatar Jun 24 '19 16:06 rlepigre

@francoisthire, there's something I don't understand. If, indeed, in Dedukti, proofs are not recorded in dko files, why are dko files as big as dk files? @EmilieGrienenberger, could you please try to use LP to see if there is a difference (LP understands dk files, so you have nothing to change, just use lambdapi instead of dkcheck)?

fblanqui avatar Jun 25 '19 07:06 fblanqui

I did some tests.

  • The thm keyword works : I tried using the def keyword to define my theorems, the dko files are as expected heavier.

  • It explodes faster using lambdapi instead of dkcheck. I don't know how .lpo files are handled, so tell me if I am wrong: the need to also have the file A.dk itself (and not only the object file A.lpo) to check another file B.dk (which requires A) would maybe indicate that some further checks are done on A.dk ? This leading to more checking time ?

EmilieGrienenberger avatar Jun 25 '19 13:06 EmilieGrienenberger

I am not surprised at all that Lambdapi yields bigger object files (this has always been the case). This is due to the fact that they contain closures (Bindlib binders).

The presence of a .dk file is necessary to check that the corresponding .lpo file is up to date (i.e., its creation time is after that of the source file). However, the source file is not read if an up to date object file is present.

By the way @EmilieGrienenberger, have you tried the let-binding trick?

rlepigre avatar Jun 25 '19 14:06 rlepigre

@rlepigre I have no guarantee that an intermediate theorem is not used in two different main theorems. A way to get around this would be to directly generate one file per theorem though. The risk is that some proof sharing will be lost, making the library heavier.. I can try it on some of the bigger proofs.

EmilieGrienenberger avatar Jun 25 '19 15:06 EmilieGrienenberger

Hello Emilie. Could you please give us more context and data:

  • Where are the files coming from?
  • What is the size in Mo (du -h) of a tar.gz of your dk files?
  • Could you please put the files somewhere and give us an URL so that we could download them? Various libraries are already available in https://deducteam.github.io/data/libraries/, except zenon_modulo which is too big. For instance, iProverModulo is 39 Mo once compressed. The limit on github is 50 Mo I believe. zenon_modulo, 336 Mo, is available on http://deducteam.gforge.inria.fr/lib/.

fblanqui avatar Jun 26 '19 09:06 fblanqui

In particular, how many thm do you have in average in each file?

fblanqui avatar Jun 26 '19 11:06 fblanqui

The feature that is requested, that is, being able to indicate that some intermediate lemmas are local to the module and should not appear in the object file, would probably also be useful for files produced from SAT traces. Similarly to what Emilie does, I define (with thm) a new theorem each time an intermediate clause is given. As files tend to be quite big (up to 55GB without compression), it would be interesting to have smaller object files, in particular because the goal is to integrate a proof obtained by a SAT solver in a bigger development.

That being said, I did not tried to generate an object file for the moment, and for technical reasons I do not have access to the files for the moment. So I cannot tell what size the object files have in my case.

By the way, is this not related to the private definitions (#141) ? Should not a module declare an interface of what can be used from the outside, and thus should be in the object file ?

NB: I did not try the local let-definitions suggested by Rodolphe yet. One reason is that they did not exist when I started lrat2dk, a second one is that I was afraid that the parser would struggle with one big declaration of 55GB.

gburel avatar Jun 26 '19 12:06 gburel

I uploaded the files on my directory in the deducteam InriaForge repo (deducteam/member/grienenberger/). It should be a good version, don't hesitate to tell me if there is an issue.

EmilieGrienenberger avatar Jun 28 '19 13:06 EmilieGrienenberger

I moved it to http://deducteam.gforge.inria.fr/lib/hollight2dk.tar.gz so that it is accessible to non Deducteam members (and also because we usually do not put big generated files under git).

fblanqui avatar Jun 28 '19 14:06 fblanqui