otp
otp copied to clipboard
lists: enable zip functions to work on lists of different lengths
This PR extends the zip
family of lists
functions so that they can work with lists of different lengths. The additional How
parameter specifies how to handle this.
-
strict
: The call will fail if not all lists are of equal length. This is the same as the current behavior, and the default if noHow
parameter is given. -
inner
: The operation will stop when one of the given lists is exhausted, eglists:zip([a, b], [c], inner)
as well aslists:zip([a], [c, d], inner)
will result in[{a, c}]
. -
{outer, Defaults}
: If a list is exhausted, the corresponding value in theDefaults
tuple will be used in that place instead, eglists:zip([a, b], [c], {outer, {x, y}})
will result in[{a, c}, {b, y}]
andlists:zip([a], [c, d], {outer, {x, y}})
will result in[{a, c}, {x, d}]
.
CT Test Results
2 files 87 suites 38m 46s :stopwatch: 1 844 tests 1 796 :heavy_check_mark: 48 :zzz: 0 :x: 2 134 runs 2 084 :heavy_check_mark: 50 :zzz: 0 :x:
Results for commit b7bf7d15.
:recycle: This comment has been updated with latest results.
To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.
See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.
Artifacts
// Erlang/OTP Github Action Bot
Alternatively, instead of having an additional How
argument, there could be distinct functions for each zip strategy.
Also, instead of {outer, Defaults}
, the {outer, ...}
part could be left off and only the Defaults
tuple could be provided.
If we can find a name for it; I think I like zip_sloppy(ListA, ListB)
and zip_sloppy(ListA, ListB, {DefaultA,DefaultB})
.
zip_lenient/2,3
, zlip/2,3
, zip2/2,3
, zippo/2,3
, zap/2,3
, zip_len/2,3
?
I do not like any of them, and hope someone else can do better... :-(
Maybe something completely different: yoke/2,3
, knit/2,3
?
If we can find a name for it; I think I like
zip_sloppy(ListA, ListB)
andzip_sloppy(ListA, ListB, {DefaultA,DefaultB})
.
slop
, or zlop
:smile_cat:
zip_lenient/2,3
,zlip/2,3
,zip2/2,3
,zippo/2,3
,zap/2,3
,zip_len/2,3
? I do not like any of them, and hope someone else can do better... :-( Maybe something completely different:yoke/2,3
,knit/2,3
?
combine
maybe? It's not taken, but I'm not sure that it captures fully what it does, and it does not carry any hint towards zip
which basically is what it does, and also unzip
to un-do it.
What I was actually thinking of is something like innerzip
or zipinner
to do what the inner
mode does now, and likewise outerzip
or zipouter
to do what {outer, Defaults}
does. It is somewhat inconvenient that the naming convention in the lists
module is to not use underscores as most other modules do, but just put all the words one after another :frowning_woman: innerzipwith3
or zipwith3inner
looks pretty confusing :sweat_smile:
Thanks for your pull request. We will schedule an OTP Technical Board meeting to discuss it.
Do you have any examples of real-world use cases?
Do you have any examples of real-world use cases?
Sadly, none that could be explained in just a few words :sweat: In a nutshell, I use it (or rather, a custom version of it) for dealing with a (custom) text protocol which consists of a command followed by a possible lot of positional arguments with optional ones towards the end, to assign "meanings" to those arguments.
Below is an abstract description of how I am using this, omitting the project-specific details.
Assume Pattern
to be [a, b, c]
for the following.
In some cases, I don't care about any surplus arguments.
> lists:zip(Pattern, [1, 2, 3, 4], inner).
[{a, 1}, {b, 2}, {c, 3}]
> lists:zip(Pattern, [1, 2], inner).
[{a, 1}, {b, 2}].
(In the second example above, there will be fewer arguments than what Pattern
describes; that may be ok or not, depending on circumstances).
In other cases, I need to assign defaults (or rather, indicate that defaults are to be used) for some arguments that may not be given.
> lists:zip(Pattern, [1, 2], {outer, {undefined, default}}).
[{a, 1}, {b, 2}, {c, default}]
> lists:zip(Pattern, [1, 2, 3, 4], {outer, {undefined, default}}).
[{a, 1}, {b, 2}, {c, 3}, {undefined, 4}]
(In the second example above, the tuples with undefined
in the first elements are unexpected arguments; it may be ok to have them in there, otherwise they could be ignored or filtered out in further processing).
In still other cases, I need to do both, which can be done in two passes (and yes, there are probably better ways to do this :sweat_smile: )
> lists:zip(Pattern,
> lists:zipwith(fun(_X, Y) -> Y end, Pattern, [1, 2, 3, 4], {outer, {undefined, default}}),
> inner).
[{a, 1}, {b, 2}, {c, 3}]
> lists:zip(Pattern,
> lists:zipwith(fun(_X, Y) -> Y end, Pattern, [1, 2], {outer, {undefined, default}}),
> inner).
[{a, 1}, {b, 2}, {c, default}]
All this could also be done by pattern matching and/or explicit recursion of course, and sometimes you will still want or even have to resort to those, but the new zip*
variants make this way easier and generic (IMO).
We have scheduled an OTB meeting for next week to discuss this PR.
@Maria-12648430 is currently n/a, I'll be taking this over for a while =^^=
We have had the OTB meeting. Overall, we liked the API and it seems to be cleanest way to implement the given functionality.
The only thing we were not sure about was the option names inner
and outer
.
Those names are short and sound nice and we assumed that the meaning has something to do with inner and outer joins in query languages. However, someone who has never heard about inner and outer joins might find them confusing.
We came up with some alternate names that might be clearer:
-
shortest
/longest
-
truncate
/pad
(We did not discuss the strict
option, but it would probably have to change as well if we would go for one of the alternate names.)
We don't want you to necessarily change the names, but we want to you think some more and give a motivation for your choice of names.
Ok :)
In general, we think that truncate
/ pad
are a better choice than shortest
/ longest
, as they imply what will be done when the lists are not of equal length. truncate
however seems too "harsh", so we suggest crop
(our preference) or clip
or prune
as alternatives.
Things are tricky for strict
however, as there seems to be no good verb alongside crop
(or whatever) and pad
to replace (or keep) it. Alternatives are...
-
equal
- not a verb, and the lists are not required to be equal but of equal length -
equal_length
- more verbose and also not a verb, but expresses that the length should be qual -
enforce
- enforce what? -
enforce_equal
-
enforce_equal_length
- most verbose but a verb and also the most exact
Oh, and yes, you're right, the names inner
and outer
were inspired by SQL inner/outer joins.
In our internal OTP chat, someone suggested trim
instead of crop
.
One approach could be to name the last argument OnDifferentLengths
and accept values trim/crop | extend/pad | raise
, with raise
being the equivalent of strict
- it reads pretty naturally
@michalmuskala I like your suggestion.
It is similar to a suggestion we came up with in the OTP chat. We came up with fail
, but I like raise
better.
@juhlig Do you find trim
| pad
| raise
acceptable?
The OTP team seems to prefer fail
over raise
? Do you find it acceptable?
I too like the suggestions made by @michalmuskala :) trim
/ pad
/ fail
sounds fine :)
Great! Please update the pull request to use trim
/ pad
/ fail
.
With this API the result can be adjusted according to the shortest (trim
) or the longest (pad
) of the two lists A
and B
with a call lists:zip(A,B, OnDifferentLengths)
but there is no way to say that I want to adjust according to the length of A or the length of B. Now you will get the chosen action trim
, pad
or fail
but you might want to do the necessary trim
or pad
according to your choice of input lists (A or B). I don't know if this is a common use case, but I can imagine that it is just as common as to get the adjustment of length according to A or B unknown which (without knowing or checking their length before hand).
Note that the previous comment is my personal one and does not change the approval to go ahead with the PR according to @bjorng s instructions. I brought this up just in case someone recognize the use case where one of the input lists are the one to adjust the length according to.
@bjorng
Great! Please update the pull request to use
trim
/pad
/fail
.
Done :)
@KennethL
I guess someone can provide or think up use cases easily for what you describe. It is surprisingly hard to accomodate for them and provide a mechanism to describe what is to be done, however, especially since what is to be done on one of the lists refers to the other list.
For example, you may want to trim A
if B
is shorter or pad A
if B
is longer, so you would have instructions like {trim_or_pad, a}
. Or instead of padding, you may want to fail if B
is longer, so you would need {trim_or_fail, a}
, or the other way round pad if B
is longer and fail if B
is shorter, so you would need {pad_or_fail, a}
. And vice versa.
It gets real tricky when we get to the family of zip
functions that operate on 3 lists, as then it is not just in relation to the other but to one of or even both of the others.
Anyway... If anybody has a nice suggestion for how to achieve this, ie a good way to provide instructions, I would be happy to add it.
Thanks!
I now see that the documentation has not been updated. That needs to be done before we can merge this PR. Also, it would be nice if you could update the property tests for the lists
module.
@bjorng We wanted to make sure the PR will get accepted and let the implementation settle down before investing more work. Now that is sorted out, I will work on the missing pieces next.
Oh, and for the docs I will need something for the since
tags.
OTP-18318
Yes, I think it is a good idea to check the options for validity when the lists turn out to be of equal length.
@bjorng I added a first draft for docs, please review. Also, the How
parameter is now checked in case the given lists are of equal length.
I will start working on property tests now.
@bjorng the last update adds property based tests for the new zip functions. If the pipeline passes and you are satisfied with them, let me know, and I'll squash the commits.
Thanks! Added to our daily builds.
It looks good in the daily builds. You can squash now.
@bjorng ok, squashed. I also augmented the commit message.