Remove Iterable<Integer>
Implementing Iterable<T> allows applications to use enhanced for loops, syntactic sugar, at the cost of performance (caused by auto-boxing). Since RoaringBitmap already implements forEach with IntConsumer, this patch removes the Iterable interface implementation from RoaringBitmap. This should encourage applications to use more efficient APIs and existing applications that use the enhanced for loop can easily convert the code to loop the iterator directly.
Addresses #459
I agree, just deleting the code is probably the least surprising thing to do when someone upgrades, and keeping the method on the base class is still binary incompatible anyway. I don't know if users should be directed towards forEach though - BatchIterator is the fastest API available, just a little awkward to use in modern Java.
One thing that concerns me about all of these changes is - despite making total sense now - is recalling the mess I encountered in Hadoop projects 5 or 6 years ago because of Guava's historical deletion-over-deprecation policy. This library is used in a lot of places and in the kind of projects prone to transitive dependency conflicts. Along with these rationalisations we should probably rename the package before releasing.
I love the Hadoop+Guava argument ;). I observed the same kind of issue with heterogenous Spark clusters, in which RoaringBitmap is definitely used. Keeping the interface, but marking it as deprecated is very fine with me.
People with strong performance considerations shall be burdened by calling deprecated methods.
So we do not delete but deprecate, correct?
@lemire
So we do not delete but deprecate, correct?
If you change the package name you can do whatever you want. Deprecating is hard to do (impossible?) with the LongConsumer/IntConsumer change because there's a clash between a method override/hide which needs to be resolved with type information. So either we stay with the structurally identical/nominally distinct LongConsumer/IntConsumer types and make none of these changes, or we make these changes and change the package name.
Do you have a specific proposal regarding how we should rename the package?
org.roaringbitmap1 should do it. I think every library should do this sort of thing - the tendency is to break (Guava) or accrue backward compatible garbage. Again, this rationalisation can't be done backward compatibly, so the tendency would be to just not do it at all even though it's a big improvement...
Ok. What do people think of org.roaringbitmap1?
@blacelle ?
Not a fan of org.roaringbitmap1.
Here is a list of (known) packages with version in them:
com.sun.xml.bind.v2.model.annotation
com.google.iam.v1
zipkin2.v1.V1Span
org.apache.commons.lang3
My preference would be: org.roaringbitmap -> org.roaringbitmap.v2 to maintain the existing parent package, and enable future versions. It may be a reason to go from 0.X to 2.X
@richardstartin What do you think of org.roaringbitmap.v2?
Sounds good, same principle
I have changed the base for this PR to https://github.com/RoaringBitmap/RoaringBitmap/tree/1.0-SNAPSHOT
I suggest that once @blacelle approves it, we merge it to 1.0-SNAPSHOT then proceed, separately, with a package renaming.
I would approve once iterator implementation is also removed.
@blacelle the problem with removing the Iterator is it becomes harder to translate for-loops to forEach because that variables from outside the lambda need to be either final/array, or some object. For example, this would produce a compilation error:
int x = 0;
rb.forEach(value -> {
x += value;
});
I agree. Still, this is the point of this evolution. This is why I proposed introducing an IntStream. One may decide the sum through an AtomicInteger. For a small unit-test, one may go with .toArray()
(Just in case, IntStream is a candidate for a future PR, not for this one).
One of the oldest tricks in the book for reductions within closures is
int[] x = {0};
rb.forEach(value -> {
x[0] += value;
});
You're always better off doing this than boxing, I don't even find it particularly dirty. Any serious use case of this library would avoid the Iterable<Integer> access method at all costs, and some users eschew convenience entirely - e.g. Druid picked up BatchIterator for its performance in their vectorised query engine, despite its extremely awkward API.
In my opinion, if your top priority is being able to write
int x = 0;
for (Integer i : set) {
x += value;
}
then HashSet is probably what you're looking for...
@blacelle you can already stream the contents of the bitmap https://github.com/RoaringBitmap/RoaringBitmap/commit/b4f75f6304afeb4d5240e17e4299a95e76ac4d08
@blacelle you can already stream the contents of the bitmap b4f75f6
BANG @richardstartin , I forgot this.
@Maithem Go for rb.stream().sum();
@blacelle @richardstartin please take a look.
@Maithem Would you mind also purging 64bits implementations? Thanks
Ping @Maithem : This PR still has requested changes... Can you check?
@lemire been busy, will take a look tonight.