cakeml icon indicating copy to clipboard operation
cakeml copied to clipboard

misc theory shouldn't define constants

Open mn200 opened this issue 7 years ago • 12 comments

If misc defines constants, you have to have its grammar in scope to see them cleanly (the alternative is having to write misc$constname). But if you do that, you also bring in everything else that misc imports, which is pretty well all of HOL (theories of the reals, integers, floating-point numbers, paths, bags, lazy lists, etc).

This is disastrous for isolation of concerns.

mn200 avatar Dec 04 '18 04:12 mn200

To make progress on this issue, I think we will need to have a discussion on what miscTheory is for and related design decisions.

I do not agree with the issues as it stands, i.e., I think miscTheory should be allowed to define constants, and furthermore, I think all of HOL should be assumed to be in scope throughout CakeML.

But I also think miscTheory should be upstreamed out of existence, so any rules about it should make sense given that transience.

xrchz avatar Dec 04 '18 13:12 xrchz

  1. I don't know where you think all of misc is going to be upstreamed to, because there are theorems in it that are never going to make it to HOL

  2. Developing with all of HOL in scope is a major pain and a maintainability disaster. If I understand you right, you'd advocate for set_grammar_ancestry ["misc", ...] (bringing misc into scope grammatically). If you do this, you can't use the variable name last (for example) because that's defined in path, even if, conceptually, your script has nothing to do with paths. A change to HOL's theory of bags can break a script file that never mentions or uses bags.

    Are you really in favour of making every developer of every script file aware of every change to every theory in HOL, and to have them change their variable names when new constants are added to theories about which they have never needed to be concerned?

mn200 avatar Dec 04 '18 21:12 mn200

  • I don't want CakeML developers to have to worry about changes to HOL (or other) theories that have no direct bearing on the theory they are working on. We agree on that goal.

  • I believe there's little (though admittedly non-zero) harm in having theories open. The harm, as you've rightly pointed out, comes from the parser (and possibly other) state. I'll get to the parser state below. Modulo state, I've been in favour of treating all HOL theories as if they are collected together in one giant import. I think this is conceptually clean if you view HOL as providing non-CakeML logical support. However, I can also see the appeal of the view that every theory is an independent module no matter which repository it lives in. I'd say a disadvantage of that view is giant, unwieldy open lines, but policy design for highly dependent theories is an under-explored area and I wouldn't be surprised if we could come up with a nice clean modular approach (perhaps in a separate discussion).

  • I think the parser state issue is nicely solved by set_grammar_ancestry. As you say, this doesn't work if you include "misc" in the list and miscTheory defines constants. Hence this issue. Preventing miscTheory from defining constants is a reasonable solution. Another solution that comes to mind is to temp_overload_on the relevant constants where needed, in addition to a set_grammar_ancestry without "misc".

So why did I initially express disagreement? I will try to explain more below. Note that I'm using this discussion to better understand the issue and come to an informed decision: I appreciate your input on this :)

  • I view miscTheory as something transient and ultimately unnecessary. Therefore, I am averse to the idea of policy mentioning it, such as "miscTheory shouldn't define constants". Of course, while miscTheory is very real and here right now, perhaps we do need some policies; but I'd rather find a way to speed up its disappearance. In the interim, I would suggest prioritising moving out constant definitions (in line with this issue's suggested policy), and using the overloading solution I mentioned above otherwise.

  • In response to your point (1), those theorems should be removed somehow. If they can't be moved to HOL, I presume it's because they're too ugly or specific. Are there other reasons? If too ugly, can a cleaner/nicer version be proved and used instead? If too specific, can they be moved back as Q.prove lemmas near their site of use (and if there's more than one site of use, are they really too specific)? (I have previously been against all uses of Q.prove; if it turns out there is a strong need for highly specific non-reusable lemmas, I could be convinced this is a job for Q.prove.)

  • (In response to your point (2), I hope it's clear from the above that I do not advocate set_grammar_ancestry ["misc", ...].)

  • I think it's important to distinguish between the ideal state and the current state. When I say I think miscTheory should be allowed to define constants, I mean that for the current state where miscTheory is still around and being used as a clearinghouse. In the ideal state (or my version of it), miscTheory should not define constants, as per this issue, because it should not exist at all.

xrchz avatar Dec 04 '18 23:12 xrchz

I see two uses for miscTheory. First and less importantly, it serves a role similar to bossLib for those who can't be bothered to start every theory with an opening salvo like

open HolKernel bossLib boolLib boolSimps lcsymtacs Parse libTheory
open bagTheory optionTheory combinTheory dep_rewrite listTheory pred_setTheory finite_mapTheory alistTheory rich_listTheory llistTheory arithmeticTheory pairTheory sortingTheory relationTheory totoTheory comparisonTheory bitTheory sptreeTheory wordsTheory wordsLib set_sepTheory indexedListsTheory stringTheory ASCIInumbersLib machine_ieeeTheory

Second, as a clearing house for things that don't fit anywhere else and that should probably be upstreamed or extirpated eventually. But I disagree that having such clearing house is something transient or unnecessary; the contents of the clearing house are of course transient, but I don't imagine a future when the need for the clearing house itself disappears. It will probably always be the case that there are things we want to upstream but can't be bothered with right now. Having an intermediate point such as miscTheory, to which they can be halfway-upstreamed, strikes me as a way nicer workflow than always going directly to HOL: that might lead to the same kind of chaos we've had last week all the time instead of once in a blue moon.

IlmariReissumies avatar Dec 05 '18 00:12 IlmariReissumies

Indeed, it is only the parsing issue that bothers me. I am mostly happy with the use of misc to act as a simple open (though, strictly, it is preamble that you open) for the purpose of giving us a logical baseline that removes the need for many open declarations.

Note how misc has already been hacked with in an attempt to try to let people use its parsing context: it starts with a whole bunch of calls to explicitly remove names that are coming in from a bunch of real theories (i.e., theories about the real numbers). When I see calls like that, or conversely, uses of misc$foo, it seems to me that these are a symptom of a problem. The cure for this problem is to never use the misc parsing context, and to not let misc define constants.


The wider question of misc and its future:

If you have misc, then you will continue to have misc until you make a concerted effort to get rid of it. Having a transient policy that encourages it to grow is only going to ensure that we never get rid of it.

(Johannes suggests that we have it as a clearing-house that acts as a moving target, but I personally think that's pretty ugly. In particular, I can do a recursive grep for TODO: HOL-move across the CakeML directories as easily as page through a miscScript.sml looking for much the same. Doing that at set intervals seems like it shouldn't be any more disruptive than pulling things out of misc. Indeed, it would be less disruptive because only affected theories and their descendants would need recompiling, instead of everything.)

As for things that aren't going to get adopted by HOL:

  • read_bytearray. This constant is just not going to cut it.

    • It's not sufficiently interesting
    • It's too close to being a simple combination of existing constants (GENLIST and OPT_MMAP)
    • To my mind, if it's to be defined at all, it clearly belongs to a ..common.. background theory about abstract treatments of memory. That's where you might also put all_words and asm_write_bytearray. If you combine all those into such a theory, and convince yourself that you have a nice, general-purpose abstraction that isn't irrevocably tied to CakeML-specific choices, that whole theory might be usefully transferred to HOL.
  • IMAGE SUC x SUBSET IMAGE SUC y UNION IMAGE SUC z <=> x SUBSET y UNION z. Never, never, never. Use of SUC makes it hopelessly specific, and it should clearly be a combination of two theorems: one about distribution of IMAGE over UNION and one about IMAGE over SUBSET. Both should be about arbitrary injective functions rather than just SUC.

  • MAP3. Why should HOL be a home to something that is never used?

  • fromList2 l = SND (FOLDL (\(i,t) a. (i + 2,insert i a t)) (0,LN) l) Clearly this really is irrevocably tied to weird CakeML choices about maps that only want to define entries for even-numbered keys. Clearly this should be defined with a GENLIST call to number the elements of the list, followed by a call to fromAList. But then, the one lemma about this in miscTheory becomes quite trivial. Clearly, no-one would find this an interesting addition to HOL.

  • EVERY_o - as per my comment (recently added on the cleanup branch), this is the GSYM of an existing theorem. Unless you can make a strong argument that this way round should be an automatic rewrite, there's no need for this in HOL.

  • anub is somewhat interesting. It doesn't need to be defined with an accumulator, though the version without has a more interesting termination proof and needs a removekey function or similar to be used as well. The version that's there is probably pragmatically better (though both are quadratic in worst case). I'm not keen on accumulator-versions for HOL, so when I do define the other, there will presumably be mass breakage. (Or I define a version in HOL called alist_norm and leave anub completely untouched.)

mn200 avatar Dec 05 '18 05:12 mn200

So, to answer one of @xrchz's questions: yes, I do believe some of the stuff in misc should be pushed back to the theories that use it. If there are multiple theories using things, then those things could be put into a ancestor theory that just those theories inherit from. (We have backend_common as an example of such a theory already.)

mn200 avatar Dec 05 '18 05:12 mn200

@mn200 are you suggesting by "if it's to be defined at all" that read_bytearray should be an overloading as follows?

read_bytearray a c gb = OPT_MMAP gb (GENLIST (λi. a + n2w i) c)

I don't think read_bytearray should be an overloading.

But I do agree that it seems like a strangely specific definition to move to HOL. I think it fits best in compiler/backend/semantics/backendPropsScript.sml in CakeML, which is a file that has the following fitting readme comment:

General definitions and theorems that are useful within the proofs
about the compiler backend.

Similarly, in my mind fromList2 also belongs there.

myreen avatar Dec 05 '18 05:12 myreen

miscTheory in the ideal state

Should we be aiming to remove miscTheory altogether? I think this is a question to answer first, because most other decisions depend on it.

  • I originally said I want miscTheory to disappear and view it only as a temporary measure.
  • @IlmariReissumies makes the good point that although the contents of a clearinghouse may continually change, the house itself can persist (and therefore can be regulated).
  • @mn200 makes the good point that miscTheory will only disappear if we try to get rid of it. I think #549 is an example of such trying, but I presume he means more.
  • All of us acknowledge the usefulness of a shortcut/abbreviation for opening "the HOL libraries". However, as @mn200 noted, preamble serves this purpose and does not require miscTheory.

Aggregating these points, I believe miscTheory ought to go. I would be happy to hear further contrary views on this matter. (I have one minor such point at the end of this message.) If there aren't any, I suppose we all agree that we are trying to get rid of miscTheory and should double down on #549 and related efforts.

Where to put the contents of miscTheory

Question for @mn200: is the list of exceptions above exhaustive? (Relative to the contents of miscTheory on the cleanup branch; there will be more decisions for things still in the rest of the repo.) For each, I have a straightforward opinion (always about the ideal/target state; there may be intermediate waypoints to getting there):

  • read_bytearray: go to backendProps as @myreen suggests
  • SUC-based lemmas: delete them and do the proofs with the more general theorems in HOL instead. If the proofs are made much worse thereby, and the SUC-based lemmas are not used more than once, move them to their use site. Otherwise we return to discussion.
  • MAP3 - MAP3 is used in compilationLib.sml. I think it should go to listTheory.
  • fromList2 - switch to the better definition suggested by @mn200, and move to backend_common or similar (depending on its exact use sites).
  • EVERY_o - delete it and fix proofs that use it to use the version in HOL.
  • anub - move to HOL in the same theory as a non-accumulator version (cf. REVERSE and REV).

Policy in the meantime

Since it's supposed to just be for a short time, I'd rather leave it unregulated. However, if people still want rules about what goes into miscTheory, I have remaining disagreements about what those rules should be. I will voice them if they become relevant.

CakeML cleanup in a world without miscTheory

Currently, a lot of work goes into keeping the CakeML code base somewhat reasonable (and it's still a sprawling mess, as code bases are wont to be). The rough workflow for this seems to be:

  1. Move generic things to the top of the script file, with a TODO: move comment.
  2. Move things with a TODO: move comment to miscTheory
  3. Upstream things in miscTheory

At every step, there are opportunities for tidying / improving lemmas, reconciling duplicates, removing unused experiments, choosing a better destination than miscTheory, etc. Without miscTheory, steps 2 and 3 need to be combined into one. I worry that this makes the already daunting job even more daunting. However, I'm happy to give it a go and would love to learn that this worry is unfounded.

xrchz avatar Dec 05 '18 09:12 xrchz

This conversation reminds me of byteTheory as proposed in #521.

myreen avatar Dec 05 '18 10:12 myreen

My list wasn't exhaustive; it was meant to demonstrate the variety of ways in which things are not right for moves to HOL. Another example would be the theorem that turns the iff that defines SUBSET into one direction of the implication. I'm mostly in agreement with how you'd handle the other cases (still don't like MAP3; what's next MAPs 4 through 10?)

I wouldn't seriously suggest using an overload for read_bytearray, but it strikes me as akin to putting a constant into HOL to represent adding one to the length of a list.

mn200 avatar Dec 05 '18 10:12 mn200

The problem with having misc at all is that it becomes a dumping ground for random crap with the implicit suggestion that I have to sort through it looking for the good stuff. Now, I admit I'm being hyperbolic: the ratio of good stuff to crap is actually very high. I think having TODOs in the original files is better practice though: energy can be focused on problems in the context of their use.

mn200 avatar Dec 05 '18 10:12 mn200

@mn200 do you still agree with this?

xrchz avatar Dec 09 '24 12:12 xrchz