Gradualizer What is the semantics of improper lists anyway?

I suppose this is more of an Erlang Questions kind of topic, but I'm afraid I would get five people with incompatible answers each claiming that theirs is the obvious answer. Also, we have the opportunity to decide what the interpretation of improper lists should be, since dialyzer will complain as soon as you use them at all.

My interpretation

Anyway, this got triggered by a discussion in #109, where I realised I'm not actually sure of what the improper-lists types really mean. I always thought they meant this:

-type maybe_improper_list(A, B)    :: B | nonempty_improper_list(A, B).
-type nonempty_improper_list(A, B) :: cons(A, maybe_improper_list(A, B)).

In other words, the type maybe_improper_list(a, b) has elements

b
[a | b]
[a, a | b]
[a, a, a | b]
...

and nonempty_improper_list(a, b) would be the same except without b. We would also have [a] == maybe_improper_list(a, []) and [a, ...] == nonempty_improper_list(a, []).

This isn't the only possible interpretation though. I guess the two key questions are (but see Dialyzer below for more subtleties):

b :: maybe_improper_list(a, b)?
[] :: maybe_improper_list(a, b)?

which I have down as a Yes and a No, respectively (henceforth referred to as the YesNo position).

The documentation

Let's see what the documentation says:

       | maybe_improper_list(Type1, Type2)    %% Type1=contents, Type2=termination
       | nonempty_improper_list(Type1, Type2) %% Type1 and Type2 as above

That's not very specific but seems to suggest a No on the second question. This is further reinforced by the definition of iolist which explicitly lists [] as a possible terminator:

-type iolist() :: maybe_improper_list(byte() | binary() | iolist(), binary() | [])

Then there's this gem:

...
nonempty_maybe_improper_list() :: nonempty_maybe_improper_list(any(), any())
nonempty_improper_list(Type1, Type2)
nonempty_maybe_improper_list(Type1, Type2)
where the last two types define the set of Erlang terms one would expect.

Ah, yes. The terms one would expect. Thanks. 🙄

I guess we can put the docs in the MehNo camp.

Dialyzer

So, what does dialyzer think?

-spec f(maybe_improper_list(a, b)) -> ok.
f(_) -> ok.

-spec g(nonempty_improper_list(a, b)) -> ok.
g(_) -> ok.

works1() -> f([a    | b]).
works2() -> f([a, a | b]).
works3() -> g([a | b]).

fails1() -> f(b).
fails2() -> f([a]).
fails3() -> f([]).
fails4() -> g(b).
fails5() -> g([a, a]).

foo.erl:16: The call foo:f('b') breaks the contract (maybe_improper_list('a','b')) -> 'ok'
foo.erl:17: The call foo:f(['a',...]) breaks the contract (maybe_improper_list('a','b')) -> 'ok'
foo.erl:18: The call foo:f([]) breaks the contract (maybe_improper_list('a','b')) -> 'ok'
foo.erl:19: The call foo:g('b') breaks the contract (nonempty_improper_list('a','b')) -> 'ok'
foo.erl:20: The call foo:g(['a',...]) breaks the contract (nonempty_improper_list('a','b')) -> 'ok'

So that's a NoNo.

Does that make maybe_improper_list and non_empty_improper_list the same? Not quite:

-spec h(maybe_improper_list(a, b | [])) -> ok.
h(_) -> ok.

-spec i(nonempty_improper_list(a, b | [])) -> ok.
i(_) -> ok.

works4() -> h([]).
fails6() -> i([]).

This seems to be implemented as a special case for [] though because

-spec j(maybe_improper_list(a, [b])) -> ok.
j(_) -> ok.

works6() -> j([]).
works7() -> j([a]).
fails7() -> j([b]).
fails8() -> j([a, b]).

It looks like dialyzer's definition is something like

-type maybe_improper_list(A, B)    :: glb([], B) | nonempty_improper_list(A, B).
-type nonempty_improper_list(A, B) :: cons(A, B | maybe_improper_list(A, B)).

Gradualizer

Gradualizer currently takes the safe approach and doesn't admit any closed values of improper list types, failing all the improper lists above.

Conclusion

We should implement something for improper lists. I'd argue that the YesNo position is the sensible one (it's certainly the easiest to write down), but the fact that dialyzer does something different makes it less obviously the right thing to do.

Dec 12 '18 16:12 UlfNorell

I think of it this way:

A singly-linked list is a list of Cons Cells.

A Cons cell is just a 2-tuple, thus [a | b] is pretty identical to {a, b} (internally it can represent it more efficiently because of assumptions it makes, falling back to essentially a tuple representation).

The empty list is it's own type known as nil, so let's say that [] == {} for this example.

So pretending a Cons Cell is just a 2-tuple (which it really is), let's use tuple syntax, so a list of [1, 2, 3, 4] is just {1, {2, {3, {4, {}}}}}, so a proper list is just Cons cells where the first element is any term() and the second element is either a Cons cell or nil, and improper list is one where each element in a Cons cell is any term(), so you could compose a 'list' like {{{{{}, 4}, 3}, 2}, 1}, which is like a normal list except it uses the other element, and in you can represent this in normal Cons syntax too via [[[[[] | 4] | 3] | 2] | 1].

A list on the BEAM is just a Cons cell, where a proper list is a Cons cell of the type proper_list() :: [term() | proper_list()] | nil() (remember nil() :: []), and an improper list is a Cons cell of the type improper_list() :: [term() | term()], and that's all it is. :-)

So for any thinking about improper lists, just pretend an improper list is a 2-tuple where the second element is 'usually' an improper list but can be anything.

Dec 12 '18 16:12 OvermindDL1

Do note, that means that list is not nor should not be a language type, but rather list should be defined in-language from the base types of Cons cells (which the BEAM does model).

Dec 12 '18 16:12 OvermindDL1

So for any thinking about improper lists, just pretend an improper list is a 2-tuple where the second element is 'usually' an improper list but can be anything.

This is overly simplistic. My question is about maybe_improper_list(A, B) for some given types A and B, and saying that the tail is "usually an improper list but can be anything" is not a reasonable typing rule.

Do note, that means that list is not nor should not be a language type, but rather list should be defined in-language from the base types of Cons cells (which the BEAM does model).

It's certainly tempting to introduce a cons type and model the various (improper) lists types in terms of this. This is how I wrote down the various semantics after all. However, the fact is that list is a language type and we've decided to (try to) work with the existing type language. Hence, the worries about what the documentation says and what dialyzer does.

Dec 12 '18 16:12 UlfNorell

and saying that the tail is "usually an improper list but can be anything" is not a reasonable typing rule.

The typing rule for an improper list is quite literally [term() | term()] by default, so something like maybe_improper_list(A, B) would be a recursive definition of something like blah(A, B) :: nil() or [A | (blah(A, B) or B or nil()) where | is the Cons cell element separator and or is the usual | in typespecs.

However, the fact is that list is a language type and we've decided to (try to) work with the existing type language. Hence, the worries about what the documentation says and what dialyzer does.

In language it is mostly syntax sugar on top of the Cons cell syntax though, [1, 2, 3, 4] == [1, [2, [3, [4, []]]]] and the Cons cell type is the more generic version of it. All Lists can be decomposed to Cell Consing, not all Cons Cells can be sugared to lists however.

Dec 12 '18 17:12 OvermindDL1

The typing rule for an improper list is quite literally [term() | term()]

Where are you getting this from? There is no official type system for Erlang, only the type language that we are currently discussing. The documentation states that maybe_improper_list() = maybe_improper_list(any(), any()) but it's quite coy about what exactly maybe_improper_list/2 means.

something like blah(A, B) :: nil() or [A | (blah(A, B) or B or nil())

So, you are saying that [] :: maybe_improper_list(a, b)? If so, what are you basing that on? As I read the documentation this should not be the case, and dialyzer agrees.

The question here isn't about the Erlang term representation of lists (which I think we're all familiar with), but about how to interpret the types for improper lists. Both semantically (what is the set of Erlang values belonging to the type maybe_improper_list(A, B)?) and as typing rules that we can actually implement in the type checker.

Dec 12 '18 17:12 UlfNorell

My interpretation: The "maybe" is referring to the improperness. Thus, for the maybe_improper list types, the list may also be proper.

Properness: proper | improper | maybe improper
Emptiness: nonempty | maybe empty | nil only

	Proper only	Improper only	Proper or improper
Empty only	`[]`	N/A	N/A
Non-empty only	`nonempty_list`	`nonempty_improper_list`	`nonempty_maybe_improper_list`
Empty or non-empty	`list`	N/A	`maybe_improper_list`

Terminology:

proper means that the terminator is []
improper means that the terminator is not []
maybe improper means proper or improper
empty list is the singleton []
nonempty means that [] is not a valid value of the type

There is no possibly-empty improper_list(A, B) because if [] would be a value, it would not be improper.

-type maybe_improper_list(A, B) :: [] | nonempty_maybe_improper_list(A, B).
-type nonempty_maybe_improper_list(A, B) :: cons(A, B | maybe_improper_list(A, B)).
-type nonempty_improper_list(A, B) :: cons(A, B | nonempty_improper_list(A, B)).

Dec 12 '18 18:12 zuiderkwast

I have a few arguments against the "maybe" meaning it can be []:

Dialyzer says no

The definition of iolist explicitly includes []:

-type iolist() :: maybe_improper_list(byte() | binary() | iolist(), binary() | [])

nonempty_improper_list(a, []) contains only proper lists

Neither are of course very strong arguments. Trying to divine some deeper meaning in the type language docs can be quite futile, and dialyzer would rather no one used improper lists at all. I would lean towards picking a semantics that doesn't clash too badly with people's expectations and making it the official interpretation.

I would not be violently opposed to the interpretation you gave. Note that you only need nonempty_improper_list as a primitive:

-type maybe_improper_list(A, B) :: [] | nonempty_maybe_improper_list(A, B).
-type nonempty_maybe_improper_list(A, B) :: non_empty_improper_list(A, [] | B).
-type nonempty_improper_list(A, B) :: cons(A, B | nonempty_improper_list(A, B)).

Dec 12 '18 18:12 UlfNorell

Where are you getting this from?

Tradition from other languages. I know Erlang doesn't have an official typing system so I have to relate to others. :-)

So, you are saying that [] :: maybe_improper_list(a, b)? If so, what are you basing that on? As I read the documentation this should not be the case, and dialyzer agrees.

Ah is it not allowed to be empty? It's not clear based on the name alone.

The question here isn't about the Erlang term representation of lists (which I think we're all familiar with), but about how to interpret the types for improper lists. Both semantically (what is the set of Erlang values belonging to the type maybe_improper_list(A, B)?) and as typing rules that we can actually implement in the type checker.

Yep, I'm primarily focusing on the semantics mostly related to the ML style languages (as erlang is very conceptually close to).

My interpretation: The "maybe" is referring to the improperness. Thus, for the maybe_improper list types, the list may also be proper.

Makes sense then, so no nil as an initial value.

I have a few arguments against the "maybe" meaning it can be []:

I'd agree here. It would be nice if Erlang were explicit about its types, but untyped languages tend to be rather ill-defined so it's expected.

I would not be violently opposed to the interpretation you gave. Note that you only need nonempty_improper_list as a primitive:

Good point.

Dec 12 '18 19:12 OvermindDL1

thanks, Ulf, for starting this thread to clarify this hazy topic. I also only had gut feelings about "maybe" referring to properness, as Viktor described. I like the definitions basing it all on nonempty_improper_list(A, B) when strictly not [] :: B (and not nonempty_proper_list(any()) :: B).

Dec 12 '18 21:12 gomoripeti

Thanks for an amazingly well-written breakdown, Ulf!

I think your arguments for the YesNo interpretation are quite compelling. Personally, I've always assumed the YesYes interpretation that Victor is arguing for.

I think it would be fairly safe to use an interpretation which is more lenient than Dialyzer, i.e. YesNo or YesYes. (I'm not even going to consider NoYes as an alternative.) We wouldn't introduce any new kinds of error that people haven't encountered before. The flip side is that we wouldn't be able to catch certain errors but given that the waters are so muddy here it seems to me that it would be better to lean towards leniency. So I'm currently in favor of YesYes with YesNo being a reasonable alternative.

Dec 12 '18 22:12 josefs

I agree, very good summary of docs and dialyzer behaviour Ulf!

I think your arguments for the YesNo interpretation are quite compelling. Personally, I've always assumed the YesYes interpretation that Victor is arguing for.

I'm was actually arguing for NoYes. That's what I intended anyway. That is, b is not a list of any kind, not even a maybe improper one. Only cons and nil are lists.

In the docs, we have

maybe_improper_list() :: maybe_improper_list(any(), any()).

Assuming b :: maybe_improper_list(a, b), we have any() :: maybe_improper_list(), making this 0-ary type quite meaningless.

The definition of iolist explicitly includes []

Yes, that's an indication of YesNo, although it could have been included just for clarity. A counter-indication is that iodata() explicitly includes binary(). With the YesNo (or YesYes) interpretation, binary() would already be part of iolist(). Then, there is no point to the existence of iodata(). It would be a synonym to iolist().

%% For reference
-type iodata() :: binary() | iolist().
-type iolist() :: maybe_improper_list(byte() | binary() | iolist(), binary() | [])`.

@josefs, may I ask why you don't want to consider NoYes as an alternative?

Why is there no type called just improper_list(A, B)?

Dec 12 '18 23:12 zuiderkwast

To me NoYes feels nonsensical. What is the purpose of the second type parameter to maybe_improper_list(A, B) if we don't allow, say, b :: maybe_improper_list(a, b)? Also, the list wouldn't be improper if we didn't allow b :: maybe_improper_list(a, b), right?

Dec 13 '18 00:12 josefs

Aha, I understand how you are thinking.

The purpose of the second parameter is to be the type of the "terminator". [a | b] is an improper list according to all definitions, even if b alone isn't. I would say an improper list has to be a cons. It cannot be empty.

I guess you could consider b the empty improper list. Would you?

Dec 13 '18 00:12 zuiderkwast

Ok, I think I see what you mean now :-) Yes, b would be the empty improper list. And you want to disallow b :: maybe_improper_list(a, b) but not [a | b] :: maybe_improper_list(a, b) or any kind of non-empty improper list. Is that a fair characterization of your standpoint?

I still disagree with this standpoint because we also have the type nonempty_maybe_improper_list which I think should behave like your NoYes suggestion (if I understand you correctly). I think it makes sense to allow b :: maybe_improper_list(a, b) but disallow b :: nonempty_maybe_improper_list(a, b).

Dec 13 '18 01:12 josefs

Under NoYes maybe_improper_list(a, b) contains [], so under YesYes you'd need [] | nonempty_maybe_improper_list(a, b) to get the same behaviour.

I'm warming up to NoYes. The mathematician in me still prefers the elegance of YesNo, but I find the maybe_improper_list() argument compelling, and requiring that you have to be a nil or a cons to be of list type (improper or otherwise), makes a lot of sense.

Dec 13 '18 06:12 UlfNorell

Is that a fair characterization of your standpoint?

Yes, exactly. Regarding nonempty_maybe_improper_list, it may be proper but it may not be empty.

-type nonempty_maybe_improper_list (a, b) :: nonempty_improper_list(a, b) | nonempty_list(a).

Dec 13 '18 14:12 zuiderkwast

More evidence...

1> is_list([a|b]).
true
2> is_list([a]).  
true
3> is_list([]). 
true
4> is_list(b). 
false

Dec 14 '18 00:12 zuiderkwast

Well just as note, is_list is defined in the C source as:

((x) & (_TAG_PRIMARY_MASK-TAG_PRIMARY_LIST))

So all it checks is that the element directly passed in is of list type (proper or improper), which is essentially just a C array where the first N-1 elements is the list elements and the last is the tail element (which is nil when a proper list or non-nill when an improper list). ^.^

Dec 14 '18 15:12 OvermindDL1

[off topic] @OvermindDL1 It seems what you have found is the is_list C macro, which is checking if a term is a cons cell, nothing else. The Erlang BIF erlang:is_list/1 is implemented in the C function enif_is_list().

Dec 15 '18 18:12 zuiderkwast

@zuiderkwast fwiw I don't find is_list(b) == false convincing argument for the NoYes interpretation. After all, is_list/1 is only concerned about proper lists. It would have been an other matter if there had been a predicate is_improper_list which would have produced the same result.

Dec 17 '18 08:12 josefs

The question is how we should come to a decision on this issue. Given that the documentation is unclear and the existing tools don't agree with the documentation means that we'll just have to take our own stance.

In order to move forward I'd like to gather everyones opinion and see if we can find some common ground. My current position is that I prefer YesYes, YesNo and NoYes in that order. @UlfNorell prefers YesNo but is warming to NoYes. @zuiderkwast prefers NoYes. Is there any other interpretation that you can see yourself supporting @zuiderkwast? @gomoripeti, can you clarify what interpretation you support?

Dec 17 '18 08:12 josefs

I support NoYes.

(I base this intuition on the definitions of iolist and iodata - which I think are the most used and very widely used maybe-improper lists - and the existing list type variant names, as summarised in a table by @zuiderkwast )

Dec 17 '18 09:12 gomoripeti

I think a consensus would be nice, rather than a fast decision. Of course we can take a preliminary decision if we want to, but there is no hurry to implement the improper stuff IMO.

I suggest we contact Kostis and Tobias who wrote the page about Types and their syntax. The text is taken almost unchanged from EEP8: http://erlang.org/eeps/eep-0008.html . Perhaps they have an idea of what "one would expect" means. We can also check with the community in erlang-questions.

Meanwhile, I suggest we focus on implementing other features for the beta release. I have the impression that most of the remaining stuff is strait-forward. Pattern matching on maps for example. Self-gradualizing reveals quite some bugs (e.g. tuples not matching term()).

Dec 17 '18 10:12 zuiderkwast

@zuiderkwast fwiw I don't find is_list(b) == false convincing argument for the NoYes interpretation. After all, is_list/1 is only concerned about proper lists

Of course is_list/1 is only looking at the head. It would be a very stupid function if it would go through the whole list. This was meant mostly as a joke. (Better arguments for NoYes are the ones Peter mentioned; my first post in this thread.)

Dec 17 '18 10:12 zuiderkwast

Were Kostis and Tobias contacted?

Jan 20 '20 21:01 Zalastax

Gradualizer Gradualizer copied to clipboard

What is the semantics of improper lists anyway?

My interpretation

The documentation

Dialyzer

Gradualizer

Conclusion

Gradualizer
Gradualizer copied to clipboard