ceylon icon indicating copy to clipboard operation
ceylon copied to clipboard

pull 'Set'-producing operations | & ~ up to 'Collection'

Open xkr47 opened this issue 6 years ago • 32 comments

I was longing for Set operators/operations in Range instances. Namely union, intersection but why not also complement and exclusiveUnion.

Sure, I can do set(3..5).union(set(4..6)), but is there any reason that Range couldn't satisfy Set? I read through #3557 and the current interfaces and their implementations and although I didn't perhaps catch everything, I felt I should ask.

Since all operations except intersection may result in split ranges, the implied return value of Set is actually just fine. One could of course have a more specific Range<Element> intersectionWithRange(Range<Element> other) for convenience.

For these set operations one would naturally not care if the ranges are increasing or decreasing - the result would be the same i.e. the result would be normalized.

As for an actual implementation it would perhaps be beneficial to override the default implementations from the Set interface for more efficient operation (as long as the "other" set is of same type as "this" set), but they would still be "just Sets"...

Thanks for listening..

xkr47 avatar Oct 31 '18 20:10 xkr47

Oh and yes, you could then do (4..8) | (6..10) and get a set containing 4, 5, 6, 7, 8, 9, 10. Or (monday..friday) & (thursday..sunday) and get a set with thursday, friday.

xkr47 avatar Oct 31 '18 20:10 xkr47

I'm not necessarily disagreeing, but the performance characteristics are much more clear when written as set(3..5).union(set(7..9)), which matters for large ranges.

jvasileff avatar Nov 01 '18 16:11 jvasileff

Yeah, you'd probably want these operations to be lazy, especially for something like ~1..2.

CPColin avatar Nov 01 '18 16:11 CPColin

@jvasileff wrote:

I'm not necessarily disagreeing, but the performance characteristics are much more clear when written as set(3..5).union(set(7..9)), which matters for large ranges.

Did you mean that they are unclear when written as (4..8) | (6..10) ? I was thinking one would implement a (probably module-local) MultiRange class that looks like a Set but internally keeps track of the ranges making up the set instead of having individual numbers stored. It would itself be immutable; further set operators would just create new instances.

@CPColin wrote:

Yeah, you'd probably want these operations to be lazy, especially for something like ~1..2.

You mean (0..10) ~ (1..2) ? Afaik ~ is not defined as an unary operator?

But yeah the MultiRange proposed above could well be lazy, but one has to be careful not to kill the performance in case e.g. incremental updates to such a Set are performed. There are several ways to mitigate such problems I guess.. flatten-by-threshold, flatten-manually (something analoguous to .sequence() of Iterable, flatten-on-read-access, ...

xkr47 avatar Nov 02 '18 08:11 xkr47

You mean (0..10) ~ (1..2) ? Afaik ~ is not defined as an unary operator?

Oops, you're right, my bad.

CPColin avatar Nov 02 '18 14:11 CPColin

Sure, I can do set(3..5).union(set(4..6)),

You can do set(3..5) | set(4..6).

but is there any reason that Range couldn't satisfy Set?

Yep, the reason is that Ranges are Lists, and therefore can't be Sets.

So what you're really asking for here is an operator for appending lists, something I've often wondered about adding. For Strings we use + for this. I suppose we could extend that to Lists-of-same-element-type. (List<T> is also a semigroup.) I have not looked closely at that option.

So you would be able to write: (3..5) + (4..6).

I dunno, I've never been in love with the idea of using + to mean append, though it is intellectually consistent.

gavinking avatar Nov 15 '18 14:11 gavinking

I suppose we could extend that to Lists-of-same-element-type. (List<T> is also a semigroup.)

Oh, well, now, I remember, or course the problem here was that Summable<T> is invariant and so if List<Character> were a Summable<List<Character>> then String could not be a Summable<String>.

And nor could Range<T> be a Summable<Range<T>>, since the "sum" of two ranges isn't a range.

gavinking avatar Nov 15 '18 17:11 gavinking

Interestingly, and subversively, nothing's stopping us from making List<T> a Multiplicable<T>, letting you write it using the * operator! (I'm not serious.)

gavinking avatar Nov 15 '18 17:11 gavinking

I suppose we could extend that to Lists-of-same-element-type. (List<T> is also a semigroup.)

Oh, well, now, I remember, or course the problem here was that Summable<T> is invariant and so if List<Character> were a Summable<List<Character>> then String could not be a Summable<String>.

And nor could Range<T> be a Summable<Range<T>>, since the "sum" of two ranges isn't a range.

Ah, excuse me, that's all rubbish, because the append() method is defined not by List, but by Sequential, and Strings are not sequences in Ceylon, but Ranges are.

So I think it would indeed be possible to make Sequential<T> satisfy Summable<T[]>, giving you a + operator for appending immutable sequences (but not possibly-mutable Lists).

That doesn't actually sound like terrible thing to do.

gavinking avatar Nov 15 '18 17:11 gavinking

Oh, I'm wrong again, the real problem is that Sequential is covariant, and Summable is invariant. So it can't be done at that level either.

gavinking avatar Nov 15 '18 17:11 gavinking

So the only thing to do would be, I guess, to factor out Unionable and Intersectable interfaces and make Map and Set implement both (union and intersection for Map are intuitive, since maps are intuitively sets of entries), and make List implement just Unionable.

That's a possible—and reasonable—thing to do. The only question is do we think it's a good idea to let people non-destructively join lists using |?

gavinking avatar Nov 15 '18 18:11 gavinking

What is the proper union of this expression:

map{"key" -> "v1"} | map{"key" -> "v2"}

luolong avatar Nov 15 '18 23:11 luolong

@luolong oops, sorry, you're right, intersection can be defined for Map, but not union.

gavinking avatar Nov 16 '18 01:11 gavinking

@gavinking wrote:

Yep, the reason is that Ranges are Lists, and therefore can't be Sets.

I understand that it is convenient that ranges can be expected to have a predictable order of elements, but aren't they also effectively Sets? I'm of course not asking to actually make them mutually inclusive, just as a thought exercise..

And nor could Range<T> be a Summable<Range<T>>, since the "sum" of two ranges isn't a range.

Could Range instead satisfy Summable<MultiRange<T>> ?

Anyway I would still miss the intersection, exclusiveunion etc. I guess I could have a shot at implementing a separate MultiRange class that would satisfy Set and allow for easy co-operation with Range instances.

I'm closing this now but feel free to keep discussing.

xkr47 avatar Nov 16 '18 06:11 xkr47

I understand that it is convenient that ranges can be expected to have a predictable order of elements, but aren't they also effectively Sets?

Well, sure, every Iterable is effectively a Set.

gavinking avatar Nov 16 '18 12:11 gavinking

One possible reasonable thing to do would be to define | and & for the Collection interface, but have them always produce Sets. That is, simply pull the current union() and intersection() methods up to Collection from Set.

That would be a change with pretty minimal impact on the language, that makes the existing | and & operators much more useful.

For example, [1->"hello", 3->"world"] | map { 1->"goodbye", 2->"sweet" } would be equal to set { 1->"hello", 3->"world", 1->"goodbye", 2->"sweet" }.

gavinking avatar Nov 16 '18 12:11 gavinking

@gavinking wrote:

Well, sure, every Iterable is effectively a Set.

Hmm, through using Java I have empirically developed the (perhaps incorrect) understanding that Sets guarantee uniqueness of entries. So in that context, a [ 1, 1 ] would not be considered to be a Set..?

Your example with a sequence and a map is of course fine because all the elements of the set are unique. But based on my hypothesis above I would expect [ 1, 1, 3 ] | set { 4, 1 } would be equal to set { 1, 1, 3, 4, 1 } which is equal to set { 1, 3, 4 }.

xkr47 avatar Nov 16 '18 12:11 xkr47

So in that context, a [ 1, 1 ] would not be considered to be a Set..?

Well, I mean, mathematically, a set is a value with just one operation, ∊, which in Ceylon we write as in. So as long as an object has a contains() method, it can be considered a set.

In particular, [ 1, 1 ] can be "considered" a Set by applying the set() function to it.

(Note that the definition of set in mathematics doesn't say anything at all about iterating values, or any of the other things that distinguish Sets from Lists.)

But based on my hypothesis above I would expect [ 1, 1, 3 ] | set { 4, 1 } would be equal to set { 1, 1, 3, 4, 1 } which is equal to set { 1, 3, 4 }.

Right, set { 1, 1, 3, 4, 1 } == set { 1, 3, 4 }.

gavinking avatar Nov 16 '18 14:11 gavinking

One possible reasonable thing to do would be to define | and & for the Collection interface, but have them always produce Sets. That is, simply pull the current union() and intersection() methods up to Collection from Set.

The idea sounds really nice.

So one would put default implementations in Collection and assumably override in Range with versions checking whether the "other" collection is also a Range, and fall back to the Collection implementation if not?

(Reopening since there is now light again :)

xkr47 avatar Nov 16 '18 18:11 xkr47

override in Range with versions checking whether the "other" collection is also a Range, and fall back to the Collection implementation if not?

They could be overridden on Sequential to make use of immutability, yes.

gavinking avatar Nov 16 '18 20:11 gavinking

I've implemented this idea (it was really straightforward) and it seems to be working well. I'll push it to a branch later.

gavinking avatar Nov 16 '18 22:11 gavinking

I've pushed my work to the 7435 branch. Now we need to see if there is really a consensus in favor of this. Feedback, please!

gavinking avatar Nov 19 '18 12:11 gavinking

I'm in favor of this.

davidfestal avatar Nov 19 '18 12:11 davidfestal

@gavinking wrote:

Well, sure, every Iterable is effectively a Set.

Hmm, through using Java I have empirically developed the (perhaps incorrect) understanding that Sets guarantee uniqueness of entries. So in that context, a [ 1, 1 ] would not be considered to be a Set..?

Your example with a sequence and a map is of course fine because all the elements of the set are unique. But based on my hypothesis above I would expect [ 1, 1, 3 ] | set { 4, 1 } would be equal to set { 1, 1, 3, 4, 1 } which is equal to set { 1, 3, 4 }.

[ 1, 1 ] is indeed not considered to be a mathematical set, because a set is a collection of distinct objects. Therefore an iterable object may or may not be a set. If Set will still retain its defining feature, that is, having distinct objects, then why not.

fill0llif avatar Nov 19 '18 20:11 fill0llif

[ 1, 1 ] is indeed not considered to be a mathematical set, because a set is a collection of distinct objects.

This isn't really correct, at least not in modern axiomatic set theory, where "set" is a primitive notion. The only operation a mathematical set has is ∊, and one can't distinguish [1] from [1,1] using ∊.

gavinking avatar Nov 19 '18 21:11 gavinking

This isn't really correct, at least not in modern axiomatic set theory, where "set" is a primitive notion. The only operation a mathematical set has is ∊, and one can't distinguish [1] from [1,1] using ∊.

Both of them are theories anyway, that's not what I really care about. The thing is, I believe having a collection that deals with distinct objects is very useful. Though if a Set doesn't specify anymore if it contains distinct objects or not, what's the difference between a Set and a Collection then?

fill0llif avatar Nov 19 '18 22:11 fill0llif

if a Set doesn't specify anymore if it contains distinct objects or not, what's the difference between a Set and a Collection then?

There is no suggestion to change the semantics of Set. The iterator for a Set always returns distinct values.

gavinking avatar Nov 19 '18 23:11 gavinking

Then it's more than ok. That would be very useful.

fill0llif avatar Nov 20 '18 00:11 fill0llif

Sorry I came a bit late to this discussion, but I think I should drop my five cents here.

I don't think | & and ~ are actually part of the semantics for Collection, but just for Set. I'm not saying they are not useful, but just no the right place.

I.e. what's the meaning of the union of two Lists? it is not unreasonable thinking that may retain elements from even if they are repeated (i.o.w. work as a concatenation); but also may just add the "missing" elements to the end of the list; or remove duplicate elements before concatenate...

Instead, I propose creating utility methods in collections, with similar meaning, that can be used to define Set methods. Say:

  • Instead of Collection.unionwe can use Collection.or, Collection.with or Collection.addAll.
  • Instead of Collection.intersection we can use Collection.and,Collection.retainAll or Collection.keep
  • Instead of Collection.exclusiveUnion we can use Collection.xoror Collection.disjoint(none of them really convincing me, but can't find something better).

Those methods can serve as foundations for Set operations, but also can be refined to adapt the results to different collection kinds.

someth2say avatar Nov 23 '18 10:11 someth2say

@someth2say The idea is that arbitrary collections can be trivially viewed as sets, because conceptually a set is just something that tells you whether or not it contains a given element. Taking the union of two lists means that you want to treat them as sets, you don't care about order, and you want to get back a set containing each distinct element that appears in either list. That's a useful thing to do.

I.e. what's the meaning of the union of two Lists? it is not unreasonable thinking that may retain elements from even if they are repeated (i.o.w. work as a concatenation); but also may just add the "missing" elements to the end of the list; or remove duplicate elements before concatenate...

The Set return type of these operations means that it is unreasonable to think that. It's another one of those things where you might get the wrong idea the very first time you encounter it, before you understand what it's supposed to be doing, but you only have to learn it that one time.

gdejohn avatar Nov 24 '18 18:11 gdejohn