HOL refactor listTheory

listTheory is cluttered with stuff that has moved in over time from rich_listTheory, and depends on probably too many things (like pred_setTheory). There is arguably a "core" theory about lists, that could serve as listTheory, and the remainder can be moved elsewhere (e.g. to rich_list, or refactored further into different pieces). This issue comes in part from fallout discussion on the HOL developers mailing list after #52 was first closed.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Oct 30 '12 17:10 xrchz

least impact option might be to have a new core_listTheory, and build a listTheory with roughly the same stuff in it as currently on top (and possibly merge it with rich_list).

Oct 30 '12 17:10 xrchz

A. Yes, rich_list (and related conversions) needs to be revised or retired. B. I like Magnus' point about having the list theory be "functional programming" in nature. It might be worth looking at SML's list structure for what ought to be in a "core" list theory:

    http://www.standardml.org/Basis/list.html

Also, I remember that Isabelle/HOL went through a big re-org of their

list theory many years ago. Might be some hints there

Konrad.

On Tue, Oct 30, 2012 at 12:17 PM, Ramana Kumar [email protected]:

least impact option might be to have a new core_listTheory, and build a listTheory with roughly the same stuff in it as currently on top (and possibly merge it with rich_list).

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9914455.

Oct 30 '12 17:10 konrad-slind

It’s not obvious to me that it is overly cluttered (and removing MEM made it less cluttered of course :-). The complete list of constants is

 ALL_DISTINCT :α list -> bool
 APPEND :α list -> α list -> α list
 CONS :α -> α list -> α list
 DROP :num -> α list -> α list
 EL :num -> α list -> α
 EVERY :(α -> bool) -> α list -> bool
 EVERY2 :(α -> β -> bool) -> α list -> β list -> bool
 EVERY2_tupled :α list # β list -> bool
 EXISTS :(α -> bool) -> α list -> bool
 FILTER :(α -> bool) -> α list -> α list
 FLAT :α list list -> α list
 FOLDL :(β -> α -> β) -> β -> α list -> β
 FOLDL2 :(α -> β -> γ -> α) -> α -> β list -> γ list -> α
 FOLDL2_tupled :(α -> β -> γ -> α) # α # β list # γ list -> α
 FOLDR :(α -> β -> β) -> β -> α list -> β
 FRONT :α list -> α list
 GENLIST :(num -> α) -> num -> α list
 GENLIST_AUX :(num -> α) -> num -> α list -> α list
 HD :α list -> α
 LAST :α list -> α
 LEN :α list -> num -> num
 LENGTH :α list -> num
 LIST_REL :(α -> β -> bool) -> α list -> β list -> bool
 LIST_TO_SET :α list -> α -> bool
 LRC :(α -> α -> bool) -> α list -> α -> α -> bool
 LUPDATE :α -> num -> α list -> α list
 MAP :(α -> β) -> α list -> β list
 MAP2 :(β -> γ -> α) -> β list -> γ list -> α list
 MAP2_tupled :(β -> γ -> α) # β list # γ list -> α list
 NIL :α list
 NULL :α list -> bool
 PAD_LEFT :α -> num -> α list -> α list
 PAD_RIGHT :α -> num -> α list -> α list
 REV :α list -> α list -> α list
 REVERSE :α list -> α list
 SET_TO_LIST :(α -> bool) -> α list
 SNOC :α -> α list -> α list
 SUM :num list -> num
 TAKE :num -> α list -> α list
 TL :α list -> α list
 UNZIP :(α, β) alist -> α list # β list
 ZIP :α list # β list -> (α, β) alist
 isPREFIX :α list -> α list -> bool
 listRel :(α -> β -> bool) -> α list -> β list -> bool
 list_case :β -> (α -> α list -> β) -> α list -> β
 list_size :(α -> num) -> α list -> num

If you were going to remove things because they were too obscure, I’d go for all those from “ListPair” (i.e., EVERY2, FOLDL2, MAP2, ZIP and UNZIP), and also LRC. The rest all look pretty reasonable to me. (The fact that we have two copies of LIST_REL (that and listRel) is not good of course...)

SET_TO_LIST is also pretty weird and wild, so I’d feel OK with deleting that.

Finally, I’ve just had a look at the Isabelle library, and I see that they have a ListMem constant, defined inductively, and for which there are no further theorems. Their theorems whose names include the substring “mem” typically have x ∈ set l sub-terms.

Oct 30 '12 22:10 mn200

Having a separate ListPair theory makes sense to me.

I would actually propose creating a completely new directory under src/ and starting a clean list development from scratch. That way users can keep on using the existing theories for now and wait until the new one stabilises (there's agreement/maturity). In time, the old theories can be abandoned en masse. Of course, the old theories can be raided for proofs -- hopefully with a fair bit of cleaning up. Mucking around with the existing theories in situ is probably going to be too disruptive.

I don't see why we can't simply mimic the Standard ML libraries (almost verbatim style). In particular, things like "GENLIST" should be called "tabulate". An separate theory can be used for set/relational stuff. I find mapPartial to be a useful SML function; zip and unzip are also pretty useful (I wouldn't describe them as being that obscure).

Oct 30 '12 22:10 acjf3

I didn't mean to denigrate ZIP; it would just be better in ListPair (as per the Basis).

I also like mapPartial, so would be more than happy to see it in listTheory. I think that setting up tabulate as an alias for GENLIST is a pretty good idea, though SML’s List.tabulate takes a pair and GENLIST is curried. I think I prefer the curried version. Other incompatibilities appear in the type of FOLDL for example. I don’t think the pain of fixing that one would be worth it.

(I can’t help but note that the basis library doesn’t have a MEM function at all.)

Finally, I don’t want to put users into the situation of deciding that they should be doing

open interimListTheory

and then having to later switch back to

open listTheory

and what if they end up including/open-ing both?

We want listTheory to be the go-to place for theorems about lists, with all the standard notions in place. Users get a pretty reasonable version of that right now. They even get a notion of membership. It doesn’t seem at all obvious to me that there is any need for a dramatic reworking of what’s there.

Oct 30 '12 23:10 mn200

There are a few other differences:

nth, EL all, EVERY concat, FLAT

SML has "find" (which is useful) and "getItem" (which I don't use), "SNOC" and "FRONT" don't appear in the Basis either.

I agree that re-implementing something as widely used as the list theory would be a fairly radical move, so there would have to be a clear imperative. A couple of previous bit-vector theories have been deprecated/removed., This approach works best when something isn't being heavily used, or if porting to the new development is very easy or highly beneficial. I think a much cleaner/better version of listTheory could be developed, especially if one isn't shackled by supporting/maintaining the existing version. However, I admit that the benefits would be largely aesthetic. The state of the current version is somewhat embarrassing.

If refactoring work is to be done on listTheory then perhaps it is best if a new branch is created for that purpose?

Oct 31 '12 00:10 acjf3

Yes, a features/better-lists branch or something would be the way to go. But what’s the plan of work?

Is it just:

move pair-ish constants to new listPairTheory
add mapPartial
provide “tabulate” alias for GENLIST
provide “concat” alias for FLAT
move LRC somewhere else, maybe a list_relationTheory

These all seem to be sufficiently low-impact that they’d be fine as small commits on master.

Oct 31 '12 00:10 mn200

Move SNOC somewhere else
Move PAD_{LEFT,RIGHT} somewhere else
Add a useful ordering on lists (lex, colex)?

Speculation: mem not in SML List structure because it has an equality type. And maybe because the designers wanted to promote using other datastructures for set-like applications.

On Tue, Oct 30, 2012 at 7:53 PM, Michael Norrish [email protected]:

Yes, a features/better-lists branch or something would be the way to go. But what’s the plan of work?

Is it just:

move pair-ish constants to new listPairTheory

add mapPartial

provide “tabulate” alias for GENLIST

provide “concat” alias for FLAT

move LRC somewhere else, maybe a list_relationTheory

These all seem to be sufficiently low-impact that they’d be fine as small commits on master.

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9928841.

Oct 31 '12 03:10 konrad-slind

Totally agree about providing a LEX relation. What would be a good name for a theory that included PAD_LEFT? And where would SNOC go? ((Half-seriously) We could just make SNOC an alias for _ ++ [_] ...)

Oct 31 '12 04:10 mn200

I think it's important to have reasonably well-founded reasons for selecting the locations for each of the constants. In particular, I don't think being "infrequently used" or "not used by me" is a good criteria for pushing something out of the core theory. It's too cheap to have extra constants around and too annoying to have "hidden / less well known" theories around. Frequency of use is too fragile/confusing a criteria.

Distinctions can be made on the basis of dependencies and hopefully some unambiguous taxonomical criteria. So having PairLists and "Equality Type / Set Based" lists seems reasonable. Whereas a dumping ground for SNOC, PAD_{LEFT/RIGHT} seems pointless.

I'd prefer it if SNOC where renamed to something like append1. An overload is an interesting proposal but it could be hard to pull off.

Oct 31 '12 08:10 acjf3

How about listSegments or something similar for some of those things in rich_list? I agree with your comments about trying to be a little bit principled, and that constants are cheap. Particularly now that most people (I hope) are able to use heaps, the fact that listTheory is perhaps rather large on disk doesn't really matter. (On Moscow ML, the larger the theory, the slower it will be to load.)

Oct 31 '12 09:10 mn200

Yes, something for list segments sounds reasonable. Having a consistent naming convention would be good, e.g. something like: listPair, listSet (or ?), listSegment.

A decision would have to be made about what gets loaded with bossLib. As a user of the heaps feature, I have a bias towards the "load everything"' approach but I agree that Mosml users won't be happy with that. (PolyML 5.5 is now pretty good at managing memory.) In any case, a library should be added as a shortcut to loading everything (users can then simply add this to their heap).

Oct 31 '12 10:10 acjf3

I agree. Loading listLib should load everything. Hopefully this will help assuage the irritation that comes with separating out different sub-theories like listPair, etc.

Konrad.

On Wed, Oct 31, 2012 at 5:16 AM, acjf3 [email protected] wrote:

Yes, something for list segments sounds reasonable. Having a consistent naming convention would be good, e.g. something like: listPair, listSet (or ?), listSegment.

A decision would have to be made about what gets loaded with bossLib. As a user of the heaps feature, I have a bias towards the "load everything"' approach but I agree that Mosml users won't be happy with that. (PolyML 5.5 is now pretty good at managing memory.) In any case, a library should be added as a shortcut to loading everything (users can then simply add this to their heap).

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9939368.

Oct 31 '12 16:10 konrad-slind

I’ve started branch features/list-refactor for work on this idea. We can give up on the whole idea if it proves too ugly, or for whatever other reason. What I’ve committed so far (0d14a132417f82b4) creates a minimal core_listTheory (without any dependency on pred_setTheory I’m sure you’ll be pleased to see).

The resulting, gutted listTheory then builds on top of this successfully. At the moment, rich_listTheory fails.

Nov 01 '12 04:11 mn200

Looks good. I'd vote to get rid of SUM from core_listTheory. Does it get used anywhere?

I'd be inclined to argue that core_listTheory should contain basic functional programming on lists (so it's similar to the SML Basis). Needs a better name than core_list, though.

Konrad.

On Wed, Oct 31, 2012 at 11:06 PM, Michael Norrish [email protected]:

I’ve started branch features/list-refactor for work on this idea. We can give up on the whole idea if it proves too ugly, or for whatever other reason. What I’ve committed so far (0d14a13https://github.com/mn200/HOL/commit/0d14a132417f82b4) creates a minimal core_listTheory (without any dependency on pred_setTheory I’m sure you’ll be pleased to see).

The resulting, gutted listTheory then builds on top of this successfully. At the moment, rich_listTheory fails.

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-9970439.

Nov 01 '12 04:11 konrad-slind

Yeah, I agree that SUM could probably go. What do you think needs to be there to support “functional programming à la Basis”? I guess perhaps GENLIST should be there. TAKE and DROP are Basis functions that could probably move there.

Nov 01 '12 05:11 mn200

I'm not sure what you meant by "Does it get used anywhere?", but I certainly use SUM in some of my theories. I don't think it's being removed in this proposal, just moved, so that's fine. But I should probably reiterate the question I sent to the dev mailing list about testing external example theories (reply there).

Nov 01 '12 11:11 xrchz

LIST_REL and EVERY2 are the same constant.

Mar 30 '13 16:03 xrchz

LIST_REL is an important constant from the quotient library. It is parallel to OPTION_REL and the infix operators for pairs and sums, ### and +++. All of these are polytypic examples of operators that take as arguments partial equivalence relations for the arguments of the type, and return as results the corresponding partial equivalence relation for the aggregate type.

Peter

On Sat, Mar 30, 2013 at 12:21 PM, Ramana Kumar [email protected]:

LIST_REL and EVERY2 are the same constant.

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-15677100 .

"In Your majesty ride prosperously because of truth, humility, and righteousness; and Your right hand shall teach You awesome things." (Psalm 45:4)

Mar 31 '13 01:03 homeier

Speaking of which, OPTION_REL is the same as OPTREL in optionTheory. I've found the latter to have a much more useful definition theorem.

I merely suggest that one day we remove one of each pair of duplicated constants [(LIST_REL,EVERY2),(OPTION_REL,OPTREL)], keeping all the unique theorems and probably keeping both names with overloads.

Mar 31 '13 06:03 xrchz

Yes. Having two constants is unnecessary. Having two names for the same constant is fine.

Mar 31 '13 06:03 mn200

Speaking of these constants like LIST_REL, there is an even more important constant in the quotient library, that perhaps might be of value defined earlier in the HOL system.

This is the infix constant ===>, defined in quotientScript.sml as

val FUN_REL = new_infixr_definition ("FUN_REL", (--$===> (R1:'a->'a->bool) (R2:'b->'b->bool) f g = !x y. R1 x y ==> R2 (f x) (g y)--), 490);

Given two partial equivalence relations R : 'a -> 'a -> bool and S : 'b -> 'b -> bool, R ===> S is the corresponding partial equivalence relation on the type 'a -> 'b. This is one of the core notions that makes higher order quotients possible.

Peter

On Sun, Mar 31, 2013 at 2:56 AM, Michael Norrish [email protected]:

Yes. Having two constants is unnecessary. Having two names for the same constant is fine.

— Reply to this email directly or view it on GitHubhttps://github.com/mn200/HOL/issues/98#issuecomment-15687906 .

"In Your majesty ride prosperously because of truth, humility, and righteousness; and Your right hand shall teach You awesome things." (Psalm 45:4)

Mar 31 '13 12:03 homeier