refactor listTheory
listTheory is cluttered with stuff that has moved in over time from rich_listTheory, and depends on probably too many things (like pred_setTheory). There is arguably a "core" theory about lists, that could serve as listTheory, and the remainder can be moved elsewhere (e.g. to rich_list, or refactored further into different pieces). This issue comes in part from fallout discussion on the HOL developers mailing list after #52 was first closed.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
least impact option might be to have a new core_listTheory, and build a listTheory with roughly the same stuff in it as currently on top (and possibly merge it with rich_list).
A. Yes, rich_list (and related conversions) needs to be revised or retired. B. I like Magnus' point about having the list theory be "functional programming" in nature. It might be worth looking at SML's list structure for what ought to be in a "core" list theory:
http://www.standardml.org/Basis/list.html
Also, I remember that Isabelle/HOL went through a big re-org of their
list theory many years ago. Might be some hints there
Konrad.
On Tue, Oct 30, 2012 at 12:17 PM, Ramana Kumar [email protected]:
least impact option might be to have a new core_listTheory, and build a listTheory with roughly the same stuff in it as currently on top (and possibly merge it with rich_list).
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9914455.
It’s not obvious to me that it is overly cluttered (and removing MEM made it less cluttered of course :-). The complete list of constants is
ALL_DISTINCT :α list -> bool
APPEND :α list -> α list -> α list
CONS :α -> α list -> α list
DROP :num -> α list -> α list
EL :num -> α list -> α
EVERY :(α -> bool) -> α list -> bool
EVERY2 :(α -> β -> bool) -> α list -> β list -> bool
EVERY2_tupled :α list # β list -> bool
EXISTS :(α -> bool) -> α list -> bool
FILTER :(α -> bool) -> α list -> α list
FLAT :α list list -> α list
FOLDL :(β -> α -> β) -> β -> α list -> β
FOLDL2 :(α -> β -> γ -> α) -> α -> β list -> γ list -> α
FOLDL2_tupled :(α -> β -> γ -> α) # α # β list # γ list -> α
FOLDR :(α -> β -> β) -> β -> α list -> β
FRONT :α list -> α list
GENLIST :(num -> α) -> num -> α list
GENLIST_AUX :(num -> α) -> num -> α list -> α list
HD :α list -> α
LAST :α list -> α
LEN :α list -> num -> num
LENGTH :α list -> num
LIST_REL :(α -> β -> bool) -> α list -> β list -> bool
LIST_TO_SET :α list -> α -> bool
LRC :(α -> α -> bool) -> α list -> α -> α -> bool
LUPDATE :α -> num -> α list -> α list
MAP :(α -> β) -> α list -> β list
MAP2 :(β -> γ -> α) -> β list -> γ list -> α list
MAP2_tupled :(β -> γ -> α) # β list # γ list -> α list
NIL :α list
NULL :α list -> bool
PAD_LEFT :α -> num -> α list -> α list
PAD_RIGHT :α -> num -> α list -> α list
REV :α list -> α list -> α list
REVERSE :α list -> α list
SET_TO_LIST :(α -> bool) -> α list
SNOC :α -> α list -> α list
SUM :num list -> num
TAKE :num -> α list -> α list
TL :α list -> α list
UNZIP :(α, β) alist -> α list # β list
ZIP :α list # β list -> (α, β) alist
isPREFIX :α list -> α list -> bool
listRel :(α -> β -> bool) -> α list -> β list -> bool
list_case :β -> (α -> α list -> β) -> α list -> β
list_size :(α -> num) -> α list -> num
If you were going to remove things because they were too obscure, I’d go for all those from “ListPair” (i.e., EVERY2, FOLDL2, MAP2, ZIP and UNZIP), and also LRC. The rest all look pretty reasonable to me. (The fact that we have two copies of LIST_REL (that and listRel) is not good of course...)
SET_TO_LIST is also pretty weird and wild, so I’d feel OK with deleting that.
Finally, I’ve just had a look at the Isabelle library, and I see that they have a ListMem constant, defined inductively, and for which there are no further theorems. Their theorems whose names include the substring “mem” typically have x ∈ set l sub-terms.
Having a separate ListPair theory makes sense to me.
I would actually propose creating a completely new directory under src/ and starting a clean list development from scratch. That way users can keep on using the existing theories for now and wait until the new one stabilises (there's agreement/maturity). In time, the old theories can be abandoned en masse. Of course, the old theories can be raided for proofs -- hopefully with a fair bit of cleaning up. Mucking around with the existing theories in situ is probably going to be too disruptive.
I don't see why we can't simply mimic the Standard ML libraries (almost verbatim style). In particular, things like "GENLIST" should be called "tabulate". An separate theory can be used for set/relational stuff. I find mapPartial to be a useful SML function; zip and unzip are also pretty useful (I wouldn't describe them as being that obscure).
I didn't mean to denigrate ZIP; it would just be better in ListPair (as per the Basis).
I also like mapPartial, so would be more than happy to see it in listTheory. I think that setting up tabulate as an alias for GENLIST is a pretty good idea, though SML’s List.tabulate takes a pair and GENLIST is curried. I think I prefer the curried version. Other incompatibilities appear in the type of FOLDL for example. I don’t think the pain of fixing that one would be worth it.
(I can’t help but note that the basis library doesn’t have a MEM function at all.)
Finally, I don’t want to put users into the situation of deciding that they should be doing
open interimListTheory
and then having to later switch back to
open listTheory
and what if they end up including/open-ing both?
We want listTheory to be the go-to place for theorems about lists, with all the standard notions in place. Users get a pretty reasonable version of that right now. They even get a notion of membership. It doesn’t seem at all obvious to me that there is any need for a dramatic reworking of what’s there.
There are a few other differences:
nth, EL all, EVERY concat, FLAT
SML has "find" (which is useful) and "getItem" (which I don't use), "SNOC" and "FRONT" don't appear in the Basis either.
I agree that re-implementing something as widely used as the list theory would be a fairly radical move, so there would have to be a clear imperative. A couple of previous bit-vector theories have been deprecated/removed., This approach works best when something isn't being heavily used, or if porting to the new development is very easy or highly beneficial. I think a much cleaner/better version of listTheory could be developed, especially if one isn't shackled by supporting/maintaining the existing version. However, I admit that the benefits would be largely aesthetic. The state of the current version is somewhat embarrassing.
If refactoring work is to be done on listTheory then perhaps it is best if a new branch is created for that purpose?
Yes, a features/better-lists branch or something would be the way to go. But what’s the plan of work?
Is it just:
- move pair-ish constants to new
listPairTheory - add
mapPartial - provide “tabulate” alias for
GENLIST - provide “concat” alias for
FLAT - move
LRCsomewhere else, maybe alist_relationTheory
These all seem to be sufficiently low-impact that they’d be fine as small commits on master.
- Move SNOC somewhere else
- Move PAD_{LEFT,RIGHT} somewhere else
- Add a useful ordering on lists (lex, colex)?
Speculation: mem not in SML List structure because it has an equality type. And maybe because the designers wanted to promote using other datastructures for set-like applications.
On Tue, Oct 30, 2012 at 7:53 PM, Michael Norrish [email protected]:
Yes, a features/better-lists branch or something would be the way to go. But what’s the plan of work?
Is it just:
- move pair-ish constants to new listPairTheory
- add mapPartial
- provide “tabulate” alias for GENLIST
- provide “concat” alias for FLAT
- move LRC somewhere else, maybe a list_relationTheory
These all seem to be sufficiently low-impact that they’d be fine as small commits on master.
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9928841.
Totally agree about providing a LEX relation. What would be a good name for a theory that included PAD_LEFT? And where would SNOC go? ((Half-seriously) We could just make SNOC an alias for _ ++ [_] ...)
I think it's important to have reasonably well-founded reasons for selecting the locations for each of the constants. In particular, I don't think being "infrequently used" or "not used by me" is a good criteria for pushing something out of the core theory. It's too cheap to have extra constants around and too annoying to have "hidden / less well known" theories around. Frequency of use is too fragile/confusing a criteria.
Distinctions can be made on the basis of dependencies and hopefully some unambiguous taxonomical criteria. So having PairLists and "Equality Type / Set Based" lists seems reasonable. Whereas a dumping ground for SNOC, PAD_{LEFT/RIGHT} seems pointless.
I'd prefer it if SNOC where renamed to something like append1. An overload is an interesting proposal but it could be hard to pull off.
How about listSegments or something similar for some of those things in rich_list? I agree with your comments about trying to be a little bit principled, and that constants are cheap. Particularly now that most people (I hope) are able to use heaps, the fact that listTheory is perhaps rather large on disk doesn't really matter. (On Moscow ML, the larger the theory, the slower it will be to load.)
Yes, something for list segments sounds reasonable. Having a consistent naming convention would be good, e.g. something like: listPair, listSet (or ?), listSegment.
A decision would have to be made about what gets loaded with bossLib. As a user of the heaps feature, I have a bias towards the "load everything"' approach but I agree that Mosml users won't be happy with that. (PolyML 5.5 is now pretty good at managing memory.) In any case, a library should be added as a shortcut to loading everything (users can then simply add this to their heap).
I agree. Loading listLib should load everything. Hopefully this will help assuage the irritation that comes with separating out different sub-theories like listPair, etc.
Konrad.
On Wed, Oct 31, 2012 at 5:16 AM, acjf3 [email protected] wrote:
Yes, something for list segments sounds reasonable. Having a consistent naming convention would be good, e.g. something like: listPair, listSet (or ?), listSegment.
A decision would have to be made about what gets loaded with bossLib. As a user of the heaps feature, I have a bias towards the "load everything"' approach but I agree that Mosml users won't be happy with that. (PolyML 5.5 is now pretty good at managing memory.) In any case, a library should be added as a shortcut to loading everything (users can then simply add this to their heap).
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9939368.
I’ve started branch features/list-refactor for work on this idea. We can give up on the whole idea if it proves too ugly, or for whatever other reason. What I’ve committed so far (0d14a132417f82b4) creates a minimal core_listTheory (without any dependency on pred_setTheory I’m sure you’ll be pleased to see).
The resulting, gutted listTheory then builds on top of this successfully. At the moment, rich_listTheory fails.
Looks good. I'd vote to get rid of SUM from core_listTheory. Does it get used anywhere?
I'd be inclined to argue that core_listTheory should contain basic functional programming on lists (so it's similar to the SML Basis). Needs a better name than core_list, though.
Konrad.
On Wed, Oct 31, 2012 at 11:06 PM, Michael Norrish [email protected]:
I’ve started branch features/list-refactor for work on this idea. We can give up on the whole idea if it proves too ugly, or for whatever other reason. What I’ve committed so far (0d14a13https://github.com/mn200/HOL/commit/0d14a132417f82b4) creates a minimal core_listTheory (without any dependency on pred_setTheory I’m sure you’ll be pleased to see).
The resulting, gutted listTheory then builds on top of this successfully. At the moment, rich_listTheory fails.
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9970439.
Yeah, I agree that SUM could probably go. What do you think needs to be there to support “functional programming à la Basis”? I guess perhaps GENLIST should be there. TAKE and DROP are Basis functions that could probably move there.
I'm not sure what you meant by "Does it get used anywhere?", but I certainly use SUM in some of my theories. I don't think it's being removed in this proposal, just moved, so that's fine. But I should probably reiterate the question I sent to the dev mailing list about testing external example theories (reply there).
LIST_REL and EVERY2 are the same constant.
LIST_REL is an important constant from the quotient library. It is parallel to OPTION_REL and the infix operators for pairs and sums, ### and +++. All of these are polytypic examples of operators that take as arguments partial equivalence relations for the arguments of the type, and return as results the corresponding partial equivalence relation for the aggregate type.
Peter
On Sat, Mar 30, 2013 at 12:21 PM, Ramana Kumar [email protected]:
LIST_REL and EVERY2 are the same constant.
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-15677100 .
"In Your majesty ride prosperously because of truth, humility, and righteousness; and Your right hand shall teach You awesome things." (Psalm 45:4)
Speaking of which, OPTION_REL is the same as OPTREL in optionTheory. I've found the latter to have a much more useful definition theorem.
I merely suggest that one day we remove one of each pair of duplicated constants [(LIST_REL,EVERY2),(OPTION_REL,OPTREL)], keeping all the unique theorems and probably keeping both names with overloads.
Yes. Having two constants is unnecessary. Having two names for the same constant is fine.
Speaking of these constants like LIST_REL, there is an even more important constant in the quotient library, that perhaps might be of value defined earlier in the HOL system.
This is the infix constant ===>, defined in quotientScript.sml as
val FUN_REL =
new_infixr_definition
("FUN_REL",
(--$===> (R1:'a->'a->bool) (R2:'b->'b->bool) f g = !x y. R1 x y ==> R2 (f x) (g y)--),
490);
Given two partial equivalence relations R : 'a -> 'a -> bool and S : 'b -> 'b -> bool, R ===> S is the corresponding partial equivalence relation on the type 'a -> 'b. This is one of the core notions that makes higher order quotients possible.
Peter
On Sun, Mar 31, 2013 at 2:56 AM, Michael Norrish [email protected]:
Yes. Having two constants is unnecessary. Having two names for the same constant is fine.
— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-15687906 .
"In Your majesty ride prosperously because of truth, humility, and righteousness; and Your right hand shall teach You awesome things." (Psalm 45:4)