otp icon indicating copy to clipboard operation
otp copied to clipboard

lists: enable zip functions to work on lists of different lengths

Open Maria-12648430 opened this issue 2 years ago • 6 comments

This PR extends the zip family of lists functions so that they can work with lists of different lengths. The additional How parameter specifies how to handle this.

  • strict: The call will fail if not all lists are of equal length. This is the same as the current behavior, and the default if no How parameter is given.
  • inner: The operation will stop when one of the given lists is exhausted, eg lists:zip([a, b], [c], inner) as well as lists:zip([a], [c, d], inner) will result in [{a, c}].
  • {outer, Defaults}: If a list is exhausted, the corresponding value in the Defaults tuple will be used in that place instead, eg lists:zip([a, b], [c], {outer, {x, y}}) will result in [{a, c}, {b, y}] and lists:zip([a], [c, d], {outer, {x, y}}) will result in [{a, c}, {x, d}].

Maria-12648430 avatar Oct 05 '22 10:10 Maria-12648430

CT Test Results

       2 files       87 suites   38m 46s :stopwatch: 1 844 tests 1 796 :heavy_check_mark: 48 :zzz: 0 :x: 2 134 runs  2 084 :heavy_check_mark: 50 :zzz: 0 :x:

Results for commit b7bf7d15.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Oct 05 '22 11:10 github-actions[bot]

Alternatively, instead of having an additional How argument, there could be distinct functions for each zip strategy.

Also, instead of {outer, Defaults}, the {outer, ...} part could be left off and only the Defaults tuple could be provided.

Maria-12648430 avatar Oct 06 '22 06:10 Maria-12648430

If we can find a name for it; I think I like zip_sloppy(ListA, ListB) and zip_sloppy(ListA, ListB, {DefaultA,DefaultB}).

zip_lenient/2,3, zlip/2,3, zip2/2,3, zippo/2,3, zap/2,3, zip_len/2,3? I do not like any of them, and hope someone else can do better... :-( Maybe something completely different: yoke/2,3, knit/2,3?

RaimoNiskanen avatar Oct 06 '22 08:10 RaimoNiskanen

If we can find a name for it; I think I like zip_sloppy(ListA, ListB) and zip_sloppy(ListA, ListB, {DefaultA,DefaultB}).

slop, or zlop :smile_cat:

zip_lenient/2,3, zlip/2,3, zip2/2,3, zippo/2,3, zap/2,3, zip_len/2,3? I do not like any of them, and hope someone else can do better... :-( Maybe something completely different: yoke/2,3, knit/2,3?

combine maybe? It's not taken, but I'm not sure that it captures fully what it does, and it does not carry any hint towards zip which basically is what it does, and also unzip to un-do it.

What I was actually thinking of is something like innerzip or zipinner to do what the inner mode does now, and likewise outerzip or zipouter to do what {outer, Defaults} does. It is somewhat inconvenient that the naming convention in the lists module is to not use underscores as most other modules do, but just put all the words one after another :frowning_woman: innerzipwith3 or zipwith3inner looks pretty confusing :sweat_smile:

Maria-12648430 avatar Oct 06 '22 09:10 Maria-12648430

Thanks for your pull request. We will schedule an OTP Technical Board meeting to discuss it.

Do you have any examples of real-world use cases?

bjorng avatar Oct 11 '22 04:10 bjorng

Do you have any examples of real-world use cases?

Sadly, none that could be explained in just a few words :sweat: In a nutshell, I use it (or rather, a custom version of it) for dealing with a (custom) text protocol which consists of a command followed by a possible lot of positional arguments with optional ones towards the end, to assign "meanings" to those arguments.


Below is an abstract description of how I am using this, omitting the project-specific details. Assume Patternto be [a, b, c] for the following.

In some cases, I don't care about any surplus arguments.

> lists:zip(Pattern, [1, 2, 3, 4], inner).
[{a, 1}, {b, 2}, {c, 3}]
> lists:zip(Pattern, [1, 2], inner).
[{a, 1}, {b, 2}].

(In the second example above, there will be fewer arguments than what Pattern describes; that may be ok or not, depending on circumstances).

In other cases, I need to assign defaults (or rather, indicate that defaults are to be used) for some arguments that may not be given.

> lists:zip(Pattern, [1, 2], {outer, {undefined, default}}).
[{a, 1}, {b, 2}, {c, default}]
> lists:zip(Pattern, [1, 2, 3, 4], {outer, {undefined, default}}).
[{a, 1}, {b, 2}, {c, 3}, {undefined, 4}]

(In the second example above, the tuples with undefined in the first elements are unexpected arguments; it may be ok to have them in there, otherwise they could be ignored or filtered out in further processing).

In still other cases, I need to do both, which can be done in two passes (and yes, there are probably better ways to do this :sweat_smile: )

> lists:zip(Pattern,
>       lists:zipwith(fun(_X, Y) -> Y end, Pattern, [1, 2, 3, 4], {outer, {undefined, default}}),
>       inner).
[{a, 1}, {b, 2}, {c, 3}]
> lists:zip(Pattern,
>       lists:zipwith(fun(_X, Y) -> Y end, Pattern, [1, 2], {outer, {undefined, default}}),
>       inner).
[{a, 1}, {b, 2}, {c, default}]

All this could also be done by pattern matching and/or explicit recursion of course, and sometimes you will still want or even have to resort to those, but the new zip* variants make this way easier and generic (IMO).

Maria-12648430 avatar Oct 13 '22 10:10 Maria-12648430

We have scheduled an OTB meeting for next week to discuss this PR.

bjorng avatar Nov 02 '22 07:11 bjorng

@Maria-12648430 is currently n/a, I'll be taking this over for a while =^^=

juhlig avatar Nov 03 '22 13:11 juhlig

We have had the OTB meeting. Overall, we liked the API and it seems to be cleanest way to implement the given functionality.

The only thing we were not sure about was the option names inner and outer.

Those names are short and sound nice and we assumed that the meaning has something to do with inner and outer joins in query languages. However, someone who has never heard about inner and outer joins might find them confusing.

We came up with some alternate names that might be clearer:

  • shortest / longest
  • truncate / pad

(We did not discuss the strict option, but it would probably have to change as well if we would go for one of the alternate names.)

We don't want you to necessarily change the names, but we want to you think some more and give a motivation for your choice of names.

bjorng avatar Nov 09 '22 04:11 bjorng

Ok :)

In general, we think that truncate / pad are a better choice than shortest / longest, as they imply what will be done when the lists are not of equal length. truncate however seems too "harsh", so we suggest crop (our preference) or clip or prune as alternatives.

Things are tricky for strict however, as there seems to be no good verb alongside crop (or whatever) and pad to replace (or keep) it. Alternatives are...

  • equal - not a verb, and the lists are not required to be equal but of equal length
  • equal_length - more verbose and also not a verb, but expresses that the length should be qual
  • enforce - enforce what?
  • enforce_equal
  • enforce_equal_length - most verbose but a verb and also the most exact

juhlig avatar Nov 10 '22 10:11 juhlig

Oh, and yes, you're right, the names inner and outer were inspired by SQL inner/outer joins.

juhlig avatar Nov 10 '22 10:11 juhlig

In our internal OTP chat, someone suggested trim instead of crop.

bjorng avatar Nov 10 '22 12:11 bjorng

One approach could be to name the last argument OnDifferentLengths and accept values trim/crop | extend/pad | raise, with raise being the equivalent of strict - it reads pretty naturally

michalmuskala avatar Nov 10 '22 13:11 michalmuskala

@michalmuskala I like your suggestion.

It is similar to a suggestion we came up with in the OTP chat. We came up with fail, but I like raise better.

@juhlig Do you find trim | pad | raise acceptable?

bjorng avatar Nov 10 '22 13:11 bjorng

The OTP team seems to prefer fail over raise? Do you find it acceptable?

bjorng avatar Nov 10 '22 14:11 bjorng

I too like the suggestions made by @michalmuskala :) trim/ pad / fail sounds fine :)

juhlig avatar Nov 10 '22 15:11 juhlig

Great! Please update the pull request to use trim / pad / fail.

bjorng avatar Nov 11 '22 06:11 bjorng

With this API the result can be adjusted according to the shortest (trim) or the longest (pad) of the two lists A and B with a call lists:zip(A,B, OnDifferentLengths) but there is no way to say that I want to adjust according to the length of A or the length of B. Now you will get the chosen action trim, pad or fail but you might want to do the necessary trimor pad according to your choice of input lists (A or B). I don't know if this is a common use case, but I can imagine that it is just as common as to get the adjustment of length according to A or B unknown which (without knowing or checking their length before hand).

KennethL avatar Nov 11 '22 06:11 KennethL

Note that the previous comment is my personal one and does not change the approval to go ahead with the PR according to @bjorng s instructions. I brought this up just in case someone recognize the use case where one of the input lists are the one to adjust the length according to.

KennethL avatar Nov 11 '22 07:11 KennethL

@bjorng

Great! Please update the pull request to use trim / pad / fail.

Done :)

juhlig avatar Nov 14 '22 08:11 juhlig

@KennethL

I guess someone can provide or think up use cases easily for what you describe. It is surprisingly hard to accomodate for them and provide a mechanism to describe what is to be done, however, especially since what is to be done on one of the lists refers to the other list.

For example, you may want to trim A if B is shorter or pad A if B is longer, so you would have instructions like {trim_or_pad, a}. Or instead of padding, you may want to fail if B is longer, so you would need {trim_or_fail, a}, or the other way round pad if B is longer and fail if B is shorter, so you would need {pad_or_fail, a}. And vice versa.

It gets real tricky when we get to the family of zip functions that operate on 3 lists, as then it is not just in relation to the other but to one of or even both of the others.


Anyway... If anybody has a nice suggestion for how to achieve this, ie a good way to provide instructions, I would be happy to add it.

juhlig avatar Nov 14 '22 08:11 juhlig

Thanks!

I now see that the documentation has not been updated. That needs to be done before we can merge this PR. Also, it would be nice if you could update the property tests for the lists module.

bjorng avatar Nov 14 '22 08:11 bjorng

@bjorng We wanted to make sure the PR will get accepted and let the implementation settle down before investing more work. Now that is sorted out, I will work on the missing pieces next.

juhlig avatar Nov 14 '22 08:11 juhlig

Oh, and for the docs I will need something for the since tags.

juhlig avatar Nov 14 '22 09:11 juhlig

OTP-18318

Yes, I think it is a good idea to check the options for validity when the lists turn out to be of equal length.

bjorng avatar Nov 14 '22 10:11 bjorng

@bjorng I added a first draft for docs, please review. Also, the How parameter is now checked in case the given lists are of equal length.

I will start working on property tests now.

juhlig avatar Nov 14 '22 11:11 juhlig

@bjorng the last update adds property based tests for the new zip functions. If the pipeline passes and you are satisfied with them, let me know, and I'll squash the commits.

juhlig avatar Nov 15 '22 11:11 juhlig

Thanks! Added to our daily builds.

bjorng avatar Nov 15 '22 15:11 bjorng

It looks good in the daily builds. You can squash now.

bjorng avatar Nov 17 '22 09:11 bjorng

@bjorng ok, squashed. I also augmented the commit message.

juhlig avatar Nov 17 '22 13:11 juhlig