RoaringBitmap icon indicating copy to clipboard operation
RoaringBitmap copied to clipboard

Remove Iterable<Integer>

Open Maithem opened this issue 4 years ago • 23 comments

Implementing Iterable<T> allows applications to use enhanced for loops, syntactic sugar, at the cost of performance (caused by auto-boxing). Since RoaringBitmap already implements forEach with IntConsumer, this patch removes the Iterable interface implementation from RoaringBitmap. This should encourage applications to use more efficient APIs and existing applications that use the enhanced for loop can easily convert the code to loop the iterator directly.

Maithem avatar Feb 11 '21 05:02 Maithem

Addresses #459

Maithem avatar Feb 11 '21 05:02 Maithem

I agree, just deleting the code is probably the least surprising thing to do when someone upgrades, and keeping the method on the base class is still binary incompatible anyway. I don't know if users should be directed towards forEach though - BatchIterator is the fastest API available, just a little awkward to use in modern Java.

One thing that concerns me about all of these changes is - despite making total sense now - is recalling the mess I encountered in Hadoop projects 5 or 6 years ago because of Guava's historical deletion-over-deprecation policy. This library is used in a lot of places and in the kind of projects prone to transitive dependency conflicts. Along with these rationalisations we should probably rename the package before releasing.

richardstartin avatar Feb 11 '21 11:02 richardstartin

I love the Hadoop+Guava argument ;). I observed the same kind of issue with heterogenous Spark clusters, in which RoaringBitmap is definitely used. Keeping the interface, but marking it as deprecated is very fine with me.

People with strong performance considerations shall be burdened by calling deprecated methods.

blacelle avatar Feb 11 '21 11:02 blacelle

So we do not delete but deprecate, correct?

lemire avatar Feb 11 '21 16:02 lemire

@lemire

So we do not delete but deprecate, correct?

If you change the package name you can do whatever you want. Deprecating is hard to do (impossible?) with the LongConsumer/IntConsumer change because there's a clash between a method override/hide which needs to be resolved with type information. So either we stay with the structurally identical/nominally distinct LongConsumer/IntConsumer types and make none of these changes, or we make these changes and change the package name.

richardstartin avatar Feb 11 '21 17:02 richardstartin

Do you have a specific proposal regarding how we should rename the package?

lemire avatar Feb 11 '21 17:02 lemire

org.roaringbitmap1 should do it. I think every library should do this sort of thing - the tendency is to break (Guava) or accrue backward compatible garbage. Again, this rationalisation can't be done backward compatibly, so the tendency would be to just not do it at all even though it's a big improvement...

richardstartin avatar Feb 11 '21 17:02 richardstartin

Ok. What do people think of org.roaringbitmap1?

@blacelle ?

lemire avatar Feb 11 '21 18:02 lemire

Not a fan of org.roaringbitmap1.

Here is a list of (known) packages with version in them:

com.sun.xml.bind.v2.model.annotation
com.google.iam.v1
zipkin2.v1.V1Span
org.apache.commons.lang3

My preference would be: org.roaringbitmap -> org.roaringbitmap.v2 to maintain the existing parent package, and enable future versions. It may be a reason to go from 0.X to 2.X

blacelle avatar Feb 11 '21 18:02 blacelle

@richardstartin What do you think of org.roaringbitmap.v2?

lemire avatar Feb 11 '21 18:02 lemire

Sounds good, same principle

richardstartin avatar Feb 11 '21 18:02 richardstartin

I have changed the base for this PR to https://github.com/RoaringBitmap/RoaringBitmap/tree/1.0-SNAPSHOT

I suggest that once @blacelle approves it, we merge it to 1.0-SNAPSHOT then proceed, separately, with a package renaming.

lemire avatar Feb 11 '21 19:02 lemire

I would approve once iterator implementation is also removed.

blacelle avatar Feb 11 '21 19:02 blacelle

@blacelle the problem with removing the Iterator is it becomes harder to translate for-loops to forEach because that variables from outside the lambda need to be either final/array, or some object. For example, this would produce a compilation error:


int x = 0;
rb.forEach(value -> {
   x += value;
});

Maithem avatar Feb 11 '21 19:02 Maithem

I agree. Still, this is the point of this evolution. This is why I proposed introducing an IntStream. One may decide the sum through an AtomicInteger. For a small unit-test, one may go with .toArray()

blacelle avatar Feb 11 '21 19:02 blacelle

(Just in case, IntStream is a candidate for a future PR, not for this one).

blacelle avatar Feb 11 '21 19:02 blacelle

One of the oldest tricks in the book for reductions within closures is

int[] x = {0};
rb.forEach(value -> {
   x[0] += value;
});

You're always better off doing this than boxing, I don't even find it particularly dirty. Any serious use case of this library would avoid the Iterable<Integer> access method at all costs, and some users eschew convenience entirely - e.g. Druid picked up BatchIterator for its performance in their vectorised query engine, despite its extremely awkward API.

In my opinion, if your top priority is being able to write

int x = 0;
for (Integer i : set) {
  x += value;
}

then HashSet is probably what you're looking for...

richardstartin avatar Feb 11 '21 19:02 richardstartin

@blacelle you can already stream the contents of the bitmap https://github.com/RoaringBitmap/RoaringBitmap/commit/b4f75f6304afeb4d5240e17e4299a95e76ac4d08

richardstartin avatar Feb 11 '21 20:02 richardstartin

@blacelle you can already stream the contents of the bitmap b4f75f6

BANG @richardstartin , I forgot this.

@Maithem Go for rb.stream().sum();

blacelle avatar Feb 11 '21 20:02 blacelle

@blacelle @richardstartin please take a look.

Maithem avatar Feb 12 '21 02:02 Maithem

@Maithem Would you mind also purging 64bits implementations? Thanks

blacelle avatar Feb 12 '21 09:02 blacelle

Ping @Maithem : This PR still has requested changes... Can you check?

lemire avatar Mar 02 '21 22:03 lemire

@lemire been busy, will take a look tonight.

Maithem avatar Mar 03 '21 18:03 Maithem