facets
facets copied to clipboard
Add Array#remove
Motivation
There are many cases when
- "duplicate" elements in an array are allowed and
- both the number and order of those elements is important (for example,
'racecar'.chars
). (If they weren't important, why aren't you usingSet
?)
So why isn't there an easy way to remove an element from such an array in a way that respects both the order and number (count) of each element? Why do all methods for removing elements from an array assume that you always want to remove all matching elements from the receiver, with no option to do otherwise?
I could not find any way to do this with the standard library. (Let me know if I'm missing something.) You would think this would be possible with Array
s, which do allow you to have duplicate values (as opposed to Set
s, which do not).
One might even think (as I did until I tried it) that this would be the behavior that the built-in Array#-
would give us, but alas it is not.
This is similar to Array#-
and Array#difference
, except that instead of removing all matches, it only removes as many occurrences as you actually ask it to subtract:
> [1, 1, 2].subtract [1]
=> [1, 2]
> [1, 1, 2].subtract [1, 1]
=> [2]
> [1, 1, 2] - [1]
=> [2]
> [1, 1, 2].difference [1]
=> [2]
Name
I'm not sure of the best name for this, but for now I'm calling it subtract
since it's a nice and simple name.
I would prefer to call it Array#-
but that operator is obviously taken by the language itself.
subtract
?
The biggest downside I can think of for subtract
is that one might assume that it is completely synonymous and identical to Array#-
operator. Just like Set#-
is identical to Set#subtract
(technically dup.subtract
). (I was going to point to String#-
as an example (from string/remove.rb / string/op_sub.rb, but I guess that is aliased as remove
rather than subtract
.)
The other downside is that Array#subtract
would be inconsistent with Set#subtract
in that Set#subtract
modifies the receiver but Array#subtract
does not (does a dup
). I would like to have both an in-place modification version and a dup
version, so I was thinking subtract!
and subtract
. But that's not consistent with the analogs in Set
: subtract
(in-place modification) and -
(dup
). I guess that's further evidence that users should rightfully expect subtract
to have the same semantics as -
(the only allowed difference being in-place modification).
So as much as I like the name subtract
, I think I may have reluctantly talked myself out of that name, since I value consistency of semantics with built-in method names more highly.
One thing I like about the name subtract
is that it still has the connotation of the arithmetic operation of subtraction, which "represents the operation of removing objects from a collection", and I believe this implementation behaves more like the intuitive understanding of subtraction than the built-in Array#-
does. If you have 3 apples (['apple']*3
) and you take away 2 of them (['apple']*2
), you should be left with 1 apple (['apple']*1
), not 0 apples. Yet:
> ['apple']*3 - ['apple']*2
=> []
Wat. Therefore, I propose subtract
:
> (['apple']*3).subtract(['apple']*2)
=> ["apple"]
Doesn't read as nicely as an operator would, but what alternative do we have? The only obvious operator, -
, is taken. ... Unless we could abuse -@
somehow to our benefit?
Comparison table
A table might be helpful...
in-place modification | returning a new object | |
---|---|---|
Set : operation to remove all occurrences of a single element/condition |
delete , delete_if |
reject |
Set : operation to remove all occurrences of every element in object |
subtract |
- |
Array : operation to remove all occurrences of a single element/condition |
delete , delete_at , delete_if |
reject |
Array : operation to remove all occurrences of every element in object |
delete_values (Facets) |
- , difference , without (ActiveSupport) |
Array : operation to remove only first/n occurrences of a single element/condition |
delete_first (proposed) |
(none) |
Array : operation to remove only first/n occurrences of every element in object |
delete_first_each! (proposed) |
delete_first_each (proposed) |
I guess the analog to Set#subtract
in Array
is delete_values
(Facets).
So I propose adding subtract
as alias to delete_values
and a new method, perhaps subtract_preserving_duplicates
for the new operation I am proposing.
The semantics of generic operations like -
are open to interpretation
The problem with generic-sounding methods like subtract
is that everyone may have their own assumptions about what it will/should do. Hash#-
is a good example of this. But unlike Hash#-
, which probably has dozens of ways it could behave, I can only think of 2 main plausible variations of Array#-
: either remove all matches or only one of each el
in other_array
.
Other names considered
-
subtract_first_occurrence
- Con: Too long. Unclear.
-
subtract_first
/remove_first
- Con: Unclear.
-
subtract_once
/delete_each_once
/remove_once
- Con: The "once" part is unclear.
-
delete_only
/remove_only
/subtract_only
(to contrast with the implied all (delete_all
,subtract_all
) implied bydelete
,-
, etc.)- Con: It's hard to make clear that we're referring to elements of the
other_array
argument, and not theother_array
argument itself. We would almost have to change the method signature todelete_only(*elements)
instead of accepting an array/enumerable, in order for this name to make sense.
- Con: It's hard to make clear that we're referring to elements of the
-
elementwise_delete_once
- Makes it clear that we're deleting elements of object, but ... too long.
-
subtract_respecting_counts
- Con: Too long. A bit ambiguous.
-
subtract_preserving_duplicates
/difference_preserving_duplicates
- Pro: One of the clearest names.
- Con: Too long.
-
subtract_occurrences
-
subtract_n
(as in, "only n occurrences of each element inother
") -
subtract_allowing_for_dups
/diff_allowing_for_dups
-
delete_each_from
- Con: Implies that it uses
delete
, butdelete
removes all copies ofel
, so this name is ambiguous)
- Con: Implies that it uses
-
delete_each_first
- I like the "each" part, but implies
delete
semantics...
- I like the "each" part, but implies
-
delete_first_each
- If we move the "each" after
delete_first
, then it may make it clearer that it uses thedelete_first
operation for each, rather thandelete
for each.
- If we move the "each" after
-
remove
- Pro: Short. Doesn't imply equivalence to
-
likesubtract
does. - Pro: Analogous to
String#remove
- Con: May sound like it's going to remove them in-place (like
delete
does). But as a counterexample,String#remove
doesn't modify the receiver (there is a#remove!
variant that does). - Con: If we're intentionally being analogous to
String#remove
, then that suffers from the same problem that thesubtract
<=>-
analogy suffers from:String#remove
removes all occurrences (gsub
).
- Pro: Short. Doesn't imply equivalence to
-
Looking at other synonyms for "subtract"...
-
whittle
-
downsize
Perhaps you can think of a better name?
How can we make the semantics clear without being wordy?
As an aside: The docs for Array#-
say:
If you need set-like behavior, see the library class Set.
but I don't understand what it means by that. Do you?
Array#-
(just like Array#|
, #delete
) already has rather set-like behavior: it removes all copies of any element in self that appear in other
(almost as if you had called to_set
first to get rid of
duplicate items, except that it doesn't de-dupe any copies of elements that you aren't deleting).
Crazy idea: abuse -@
to give different semantics for subtraction?
It looks like the Array#-@
operator is not currently used for anything. Would it be a terrible idea to add a -@
operator that returns a Subtrahend
object and override the built-in +
to accept this Subtrahend
object (delegating anything else to super
(the existing, built-in behavior)?
Then we could write:
['apple']*3 + -['apple']*2
#=> ["apple"]
Is that any better? It's obviously still not as nice as ['apple']*3 - ['apple']*2
.
And subtrahend
is technically the number being subtracted, so semantically it should be passed to the -
operator... but that wouldn't read correctly because 2 -
make a +
:
['apple']*3 - -['apple']*2
(if it existed) would be expected to be equivalent to:
['apple']*3 + ['apple']*2
How about a suffix/flag on the array to indicate count should be preserved?
Since Ruby's default seems to be to de-duplicate during operations such as Array#-
and Array#difference
and (obviously, because it's "set union") Array#|
, maybe we need a flag to be able to explicitly "opt out" of that behavior.
How about something like...
(['apple']*3).preserve_duplicates - (['apple']*2)
?
Maybe that would wrap the array in something like a ArrayPreservingDuplicates
object... which seems redundant since arrays already preserve duplicate elements (at rest) — just not through -
operation.
If we did that, would the .preserve_duplicates
suffix go on the minuend
or on the subtrahend
? Or maybe putting it on either operand would be enough to effect this behavior...
As an aside: The docs for Array#- say:
If you need set-like behavior, see the library class Set.
but I don't understand what it means by that. Do you?
It probably means or refers on how is implemented Set#-
# File lib/set.rb, line 462
def -(enum)
dup.subtract(enum)
end
# File lib/set.rb, line 441
def subtract(enum)
do_with_enum(enum) { |o| delete(o) }
self
end
Looks familiar?
Thanks for the reply, @esparta!
It probably means or refers on how is implemented Set#-
# File lib/set.rb, line 462 def -(enum) dup.subtract(enum) end # File lib/set.rb, line 441 def subtract(enum) do_with_enum(enum) { |o| delete(o) } self end
Looks familiar?
Ah, okay. I admit, I didn't look closely enough at Set
to even notice that it already had a subtract
method. That seems to put the final nail in the coffin of the idea of using subtract
for this new operation. If anything, Array#subtract
should be an alias for Array#delete_values
. I've modified my proposal to reflect this.
Haha, I guess it looks familiar in the sense that it looks remarkably like the implementation I came up with for Array#subtract!
(not sure if that's what you were alluding to, or something else):
def subtract!(other)
other.each do |el|
delete_at(index(el))
end
self
end
So to clarify, what I find confusing about them mentioning "If you need set-like behavior, see the library class Set" in the documentation for Array#-
is that that statement is generally true, and makes sense as a note in the Array
class in general — but as a note specific to Array#-
, it's a bit misleading because Array#-
already has rather set-like behavior. Array#-
(and friends) are kind of half-way array
-like (in that it leaves intact duplicate occurrences of elements that you aren't deleting) and half-way set
-like (in that it deletes duplicates of elements that you are deleting) in their behavior.
If example in Array#-
had been this:
[ 1, 1, 2, 2, 3, 3, 4, 5 ] - [ 1, 2, 4 ] #=> [ 1, 2, 3, 3, 5 ]
instead of this:
[ 1, 1, 2, 2, 3, 3, 4, 5 ] - [ 1, 2, 4 ] #=> [ 3, 3, 5 ]
, then I could see making a call-out to Set
as providing contrasting behavior.
But since their example was:
[ 1, 1, 2, 2, 3, 3, 4, 5 ] - [ 1, 2, 4 ]
#=> [ 3, 3, 5 ]
, the behavior this would be contrasted with in Set
would be:
[ 1, 1, 2, 2, 3, 3, 4, 5 ].to_set - [ 1, 2, 4 ].to_set
# => #<Set: {3, 5}>
But that result is remarkably similar! The only difference is that Set#-
results in the duplicate occurrences of 3
being collapsed into just 1 occurrence. The other duplicate occurrences are treated the same in both cases.
But that difference in behavior seems due more to the inherent differences in the data structures in general (Array
allows duplicate elements at rest, Set
doesn't) than a result of a difference in behavior between Array#-
and Set#-
specifically. Right?
Maybe it's just me, but the rb_ary_includes_by_eql
in the implementation of Array#-
already seems set-like to me:
if (RARRAY_LEN(ary1) <= SMALL_ARRAY_LEN || RARRAY_LEN(ary2) <= SMALL_ARRAY_LEN) {
for (i=0; i<RARRAY_LEN(ary1); i++) {
VALUE elt = rb_ary_elt(ary1, i);
if (rb_ary_includes_by_eql(ary2, elt)) continue;
rb_ary_push(ary3, elt);
}
return ary3;
}
(Sorry this went long...)
Haha, I guess it looks familiar in the sense that it looks remarkably like the implementation I came up with for Array#subtract! (not sure if that's what you were alluding to, or something else
:) that's what I meant. When I did lookup for the implementation of Set#-
I saw a remarkable similar implementation.
So to clarify, what I find confusing about them mentioning "If you need set-like behavior, see the library class Set" in the documentation for Array#- is that that statement is generally true, and makes sense as a note in the Array class in general — but as a note specific to Array#-, it's a bit misleading because Array#- already has rather set-like behavior.
I know what you meant, and if it helps it confused me too. It needs to be changed somehow in order to either be clear enough or remove it if doesn't make sense in the current behavior. There's a on going process of re-write the documentation on ruby core, coincidentally it's nowadays on array.c
, at some point it will be cover, check Burdette Lamar PRs: https://github.com/ruby/ruby/pulls?q=is%3Apr+author%3ABurdetteLamar
About the Set behavior...
[ 1, 1, 2, 2, 3, 3, 4, 5 ].to_set - [ 1, 2, 4 ].to_set
# => #<Set: {3, 5}>
But that result is remarkably similar! The only difference is that Set#- results in the duplicate occurrences of 3 being collapsed into just 1 occurrence. The other duplicate occurrences are treated the same in both cases.
It's tricky to explain without going too deep into mathematics, but I'd say the gist is: the subtraction operator is performed after applying the conversion to Set
, so at the end of this, it's basically the same. Again, is confusing to read.
Maybe it's just me, but the rb_ary_includes_by_eql in the implementation of Array#- already seems set-like to me:
if (RARRAY_LEN(ary1) <= SMALL_ARRAY_LEN || RARRAY_LEN(ary2) <= SMALL_ARRAY_LEN) {
for (i=0; i<RARRAY_LEN(ary1); i++) {
VALUE elt = rb_ary_elt(ary1, i);
if (rb_ary_includes_by_eql(ary2, elt)) continue;
rb_ary_push(ary3, elt);
}
return ary3;
}
IMO, it's kind of different, on both array's branches (for smallers < 16 or big > 16) is the same: add to C if A[i] is not in B; for Set#-
and your proposal is Delete on A[i] if B[n]
I know what you meant, and if it helps it confused me too.
Thanks, glad I'm not the only one :smile:
It needs to be changed somehow in order to either be clear enough or remove it if doesn't make sense in the current behavior. There's a on going process of re-write the documentation on ruby core, coincidentally it's nowadays on
array.c
, at some point it will be cover, check Burdette Lamar PRs: https://github.com/ruby/ruby/pulls?q=is%3Apr+author%3ABurdetteLamar
Wow, Burdette Lamar is doing some excellent work there! Thanks for pointing that out. I hope this part of the docs can be fixed too as part of that effort.
It's tricky to explain without going too deep into mathematics, but I'd say the gist is: the subtraction operator is performed after applying the conversion to
Set
, so at the end of this, it's basically the same. Again, is confusing to read.
Not sure if I quite get your point here, but at least we agree that it is confusing. :smile:
When you say the -
operator is performed after converting to Set
, I assume you're referring to when we explicitly cast to_set
and use Set#-
?:
[ 1, 1, 2, 2, 4, 5 ].to_set - [ 1, 2, 4 ].to_set
# => #<Set: {5}>
So sure, it makes sense to give that result when you explicitly convert the objects to Set
s but it doesn't make as much sense that Array#-
(which doesn't actually convert anything to a Set
) gives an identical (in this example), set-like result:
[ 1, 1, 2, 2, 4, 5 ] - [ 1, 2, 4 ]
#=> [ 5 ]
I think @olivierlacan said it best when he said:
But it is not a set. It’s an array. If I wanted a set, I would to_set.
and:
One may expect
Array#-
to behave like mathematical subtraction or difference when it doesn't. One could be forgiven to expect the following behavior:[1,1,2,2,3,3,4,4] - [1,2,3,4] => [1,2,3,4]
It occurred to me that this would be easier to think about and explain if we actually split this into 2 operations, which I'm tentatively calling:
-
delete_first
: does just what it sounds like: it deletes the given first occurrence of the element from the receiver (destructively).- This name intentionally starts with
delete_
to make it very clear that it behave similarly todelete
. And the_first
part makes it clear in which way it is different fromdelete
- This name intentionally starts with
-
delete_first_each
:- This name tries to make it clear that it uses the
delete_first
operation (rather thandelete
asSet#subtract
does) foreach
element inother
. - The
_each
part makes it clear that it does this operation for each element of the passed enumerable, similar toreverse_each
,each_index
,each_cons
,each_slice
. (Could call iteach_delete_first
but that doesn't read as well in English: as in, "delete each element..." but replacing thedelete
operation withdelete_first
.)
- This name tries to make it clear that it uses the
This would fill in 3 holes in this table...
in-place modification | returning a new object | |
---|---|---|
Set : operation to remove all occurrences of a single element/condition |
delete , delete_if |
reject |
Set : operation to remove all occurrences of every element in object |
subtract |
- |
Array : operation to remove all occurrences of a single element/condition |
delete , delete_at , delete_if |
reject |
Array : operation to remove all occurrences of every element in object |
delete_values (Facets) |
- , difference , without (ActiveSupport) |
Array : operation to remove only first/n occurrences of a single element/condition |
delete_first (proposed) |
(none) |
Array : operation to remove only first/n occurrences of every element in object |
delete_first_each! (proposed) |
delete_first_each (proposed) |
I don't love the name delete_first_each
, but it's the clearest name I've come up with so far. Can anyone come up with a better name?
Downsides of name delete_first_each
:
- It seems strange to name a method
each_
when intuitively it acts more like an operation on 2 arrays (like "subtract"). This name is maybe too much describing how its implementation details than describing the operation at a higher level. - Lacks any connection to the mathematical subtraction operation.
This operation really should be the operation that bears the name -
/difference
since it is the truest to the mathematical subtraction operation, but since those names are already taken by existing methods in Ruby (which as Olivier pointed out would have been more appropriately named Array#set_difference maybe
or even Array#subtract_all
), we have no choice but too pick a less-ideal name for this one. (Unless Matz is willing to revert the new Array#difference
and accept https://bugs.ruby-lang.org/issues/11815 instead, but that is probably unlikely.)
So, is delete_first_each
better than subtract
?
I still like the name subtract
better. It's just that it's not consistent with Set#subtract
(the destructive version of Set#-
) and consistency is probably a more important concern...
How about Array#remove
? That is the name Cary Swoveland proposed in https://bugs.ruby-lang.org/issues/11815 after his original Array#difference
name became no longer available.
And thinking about that option more, it doesn't seem as objectionable to me as it once did:
- Since the goal is to ultimately get it accepted in Ruby core, it doesn't really matter if it matches the semantics of anything in Facets, such as
String#remove
. (We might even want to renameString#remove
toremove_all
.) It only matters whether it is consistent with or conflicts with what is in Ruby core. -
remove
may sound like it removes the elements destructively likedelete
does — but not necessarily. In fact, the fact that the name doesn't contain the word "delete" (which always means it modifies the receiver in place inArray
,Hash
, andSet
) may be enough to clarify that it doesn't have the same destructive semantics thatdelete
has. This allows us to have bothremove
andremove!
variants.