fastutil
fastutil copied to clipboard
Fast in-place compute/merge methods in OpenHashMap classes
Some things I'd like to highlight:
nullhandling is preserved for default Map methods, so nodefaultValuesupport (checked in tests)0is treated asnullelement inDouble/Float2xxxmaps which seems rather suspiciousxxxMap#computeIfAbsentmethods withxxxUnaryOperatorcan be namedcomputeIfAbsentFastto avoid clashing
Friendly bump. Were my changes of any use?
One note: I think using ...AsInt as suffix is a bit more in line with JDK convention than ...Fast. For computeIfAbsentFast I'd rather say computeAsIntIfAbsent or computeIfAbsentAsInt (replacing Int with the respective type, of course). But that's a matter of taste. Leading to the next point.
Also, please replace the CONCAT with concrete names. Something like COMPUTE_IF_ABSENT macros etc. (which can be defiend using CONCAT of course). This way, other parts of the implementation can call these methods safely (for example, wiriting this.COMPUTE_IF_ABSENT(foo, bar)). And these names can be easily changed at a single place.
@incaseoftrouble thanks for response
I've extracted CONCAT'd names, and tried to replace Fast suffix with something you've proposed.
- For maps with primitive keys (e.g.
Int2IntOpenHashMap) everything goes smooth.fastutilalready uses names likemergeIntin maps with reference keys, so I switched to justmergeInt,computeIntIfPresent(notmergeAsInt, etc.). - For maps with reference keys (e.g.
Object2IntOpenHashMap) we have a problem. They already usecomputeIntandmergeIntnames for methods withjava.util.function.BiFunctionargument. 2.1 We can't useIntsuffix due to lambda-unfriendly overloading.mergeInt(0, 1, (a, b) -> a)won't compile without explicit cast. 2.2 We can't useAsIntsuffix becausemergeIntandmergeAsIntormergeIntAsIntmethods in the same class look indistinguishable.
Also "fast" methods in maps with primitive and non-primitive keys should have similar names, shouldn't they? I'm not sure how to deal with all that without breaking the current API.
Ugh. What a headache.
What if instead of providing a new method we check whether the given remapping function (in COMPUTE_IF_PRESENT) is instanceof JDK_PRIMITIVE_BI_FUNCTION? Unfortunately, these methods aren't bulk operations, so the instanceof is called quite often. But I guess this is better than providing yet another specialized method. The difference is that this does not cause boxing / unboxing, so the only overhead is a cast, no objects are being allocated. Also, the instanceof check can actually be delayed until the function needs to be called.
On the second thought, mergeInt and computeInt overloads look not that bad, consider:
mergeInt(1, 2, (int a, int b) -> a) or m.mergeInt(2, 2, (IntBinaryOperator)(a, b) -> a).
Source-compatibility would be broken, but binary compatibility would stay intact.
Existing code would need explicit types to be compiled, e.g. mergeInt(1, 2, (Integer a, Integer b) -> a).
What do you think?
Well, source compatibility is important. I'd rather have different names than adding that cast (pragmatically, that's less characters). Just compare: map.mergeIntInt(1, 2, Integer::sum) to map.mergeInt(1, 2, (IntBinaryOperator) Integer::sum). I personally prefer the former.
But its maybe for @vigna to decide :-)
Welcome to naming headaches in fastutil—one of my favorite pastimes in the last 22 years 😂.
So, I examined the code and it's great work. Sorry for the delay in answering. Let's try to work out the naming problems and merge this in.
A good question: why not creating type-specific BiFunctions? From what I understand in that case here the advantage is only in the return type...
A good question: why not creating type-specific BiFunctions? From what I understand in that case here the advantage is only in the return type...
Well, if you want all then you'd cubic amount of functions (x, y -> z), maybe we can generate one for a) the "primary" types (int, long, double; for computations it doesn't matter anyway) and b) only where Java doesn't have one already.
After having see @techsy730 clever idea for forEachRemaining(), I think we can try to do the same here. The idea is to have a type-specific BinaryOperator class. Then we implement the version based on the JDK primitive thing, if it exists, and we add an identical method accepting a type-specific BinaryOperator. The trick is that the latter is more specific than both the other options (JDK primitive and not), and so it is automatically chosen by the lambda.
The only difference is a little code duplication—in the case of Iterator, @techsy730 added a stub delegating the method. But in this case is a single delegation for a batch of work. In this case I would simply duplicate the (short) code for the two versions for int, long and double. In the other cases there's no duplication.
@vigna Do I understand correctly that you are referring to the following solution?
import java.util.function.*;
class Scratch {
public static void func(ObjectIntMap<Integer> map) {
map.mergeInt(10, 20, (a, b) -> a + b); // ok, IntBinaryOp inferred
map.mergeInt(10, 20, Integer::sum); // ok, IntBinaryOp inferred
// We have preserved source (almost) and binary compatibility but:
// 1. IntBinaryOp interface and the related method are here forever for binary compatibility
// 2. the following code will become an error
map.mergeInt(10, 20, (a, b) -> null);
// 3. the following code is now also possible
map.mergeInt(10, 20, new IntBinaryOp() {
@Override public int applyAsInt(int left, int right) { return left + right; } // <--- that one is used
@Override public Integer apply(Integer left, Integer right) { return left - right; } // ignored
});
}
interface ObjectIntMap<K> {
int mergeInt(K key, int v, BiFunction<? super Integer, ? super Integer, ? extends Integer> func); // JDK boxed
int mergeInt(K k, int v, IntBinaryOperator func); // JDK primitive
// more specific version with our helper interface
int mergeInt(K k, int v, IntBinaryOp func); // duplicate or delegate to JDK primitive implementation
}
@FunctionalInterface
interface IntBinaryOp extends IntBinaryOperator, BiFunction<Integer, Integer, Integer> {
@Override
default Integer apply(Integer integer, Integer integer2) {
return this.applyAsInt(integer.intValue(), integer2.intValue());
}
}
}
Almost. IntBinaryOp has in my mind has an apply(int, int) method, which is the one delegating to applyAsInt() if available. As usual, the method inherited from BinaryOperator delegates to the type-specific one and is deprecated. The ignored method would give a warning as you'd be overriding a deprecated method. Another advantage is that we would have composition for all types.
It is also possible in the future that we do something like @techsy730 work on streams, in which we also adapt JDK primitive operators (e.g., you can use an IntBinaryOperator with character values).
More precisely, a type-specific BinaryOperator interface would have a default implementation of applyAsInt() delegating to the type-specific apply().
So, I slightly hijacked this PR in the sense that I implemented type-specific binary operators. The modifications to gencsources.sh are no longer necessary. Could you try to bring this PR in the form we discussed above?
I've pushed changes for Int/Long/Double maps as we discussed above (had to fix BinaryOperator.drv a bit). We can use new type-specific operators for all other primitive maps.
OK, I understand a bit better now the situation. The problem is that this has to be fixed first at the interface level, or open hash maps will have type-specific method no other implementation has.
If I understand correctly, we would add compute() methods only if the type of keys and values is the same (as otherwise bifunctions would be too many) and merge for all types because in that case the bifunction it is just a binary operator. Presently, moreover, the only code is for the JDK special types int/long/double, for which there are functions. We would get some ambiguity when there is an object-based operator and a JDK operator.
OK, I understand a bit better now the situation. The problem is that this has to be fixed first at the interface level, or open hash maps will have type-specific method no other implementation has.
Sorry, I was referring to Byte/Char/Short/.. open hash maps, as those types now have their binary operators missing in JDK thanks to BinaryOperator.drv. Or we can use widened operators there for JDK methods. I don't care about them much, so it is for you to decide. I.e.:
interface Object2CharOpenHashMap<K> {
int mergeChar(K k, char v, BiFunction<? super Character, ? super Character, ? extends Character> func); // existing
// int mergeChar(K k, char v, IntBinaryOperator func); // optional JDK primitive for widened type
int mergeChar(K k, char v, CharacterBinaryOp func); // type-specific operator
}
If I understand correctly, we would add compute() ...
Yes, compute and computeIfPresent have same signature (K, (K, V)-> V) -> V, so type specific methods with BinaryOperator for (K, V)->V argument is generated only when K and V have equal wide type.
The logic I've added is kind of dormant now, as I skip those methods for non-Int/Long/Double maps.
Hey, we're working for a better world—in the sense of a more readable #ifdef'ing of the code. We'll come back to this PR as soon as things are stabilized (hopefully soon).
So, there has been a lot of internal reworking inside fastutil, so we will need to bring this up to date with those reworking. I'm also thinking about including functions K⨉K->V and K⨉V->V, which are relatively limited in number, so we can have type-specific implementation of these functions (compute, merge, etc.) for all types. We're very close to a release, so I would wait for the next one (soon) to include this stuff and possibly solve #101 , which is strictly related.
Something wrong with 4be250e92d8ce6eb0efcaa4545046067340aaa9b. MERGE def is deleted and a new method with MERGE name is added in Map.drv. I can't compile my PR because it uses MERGE=merge and I get duplicate merge methods error in Int2ObjectMap and other *Map classes.
Urgh. I made so many changes—some mistakes slipped through. Thank you for reporting this, I'm looking into it...
That should be fixed with 6bacc8990. There were several cases in which the method was compiled in, but it wasn't necessary.
I think I didn't left out any case (or there would be pre-existing methods that do not exist anymore).
Compiles OK now. Are there any questions left for this PR?
Almost there.
- To bring this in line with the overall restyling, we need to avoid #ifdef. There are several things you can use, KEYS_INT_LONG_DOUBLE, etc., but we have tried to eliminate anything that is not readable immediately.
#ifdef SOME_METHODmakes incredibly difficult for the reader to understand when the code will be compiled in. - Did you try whether no ambiguity arises? You should be able to write lambdas for all these methods without any type indications.
- I would like to add functions XY->Y and XX->Y to complete the picture.
So the plan is to release 8.5.0 (which certainly will have bug reports), and while it stabilizes complete this PR. I would like to have those functions in because there might be ambiguity problems we do not realize immediately. Then we can have default methods in Map.drv, and these optimized methods for hash maps.
Adding the functions is a bit tricky, because to preserve your idea, which I think is very good (always supporting the JDK binary operator in some way) we need some conditional compilation similar to that for functions. Like, CharChar2ByteFunction should extend IntBinaryOperator. In this way we can play that "three method game" that we played with all other methods (one method has a type that extends both the other two).
Does that sound sensible?
To bring this in line with the overall restyling, we need to avoid #ifdef.
Do you have some specific code lines in mind?
Did you try whether no ambiguity arises?
Sure. Object2IntOpenHashMapTest.java has new tests for that.
I would like to add functions XY->Y and XX->Y to complete the picture.
We can add that later, while already testing the new API...
Do you have some specific code lines in mind?
Everything that is not explicit. #ifdef COMPUTE, for example. In general, no #ifdef (stuff like KEYS_INT_LONG_DOUBLE has an actual value 0 or 1, so you can #if it).
Sure.
Object2IntOpenHashMapTest.javahas new tests for that.I would like to add functions XY->Y and XX->Y to complete the picture.
We can add that later, while already testing the new API...
Mmmh. My only fear is that adding methods we might discover that some change of signature is necessary, and then we cannot rollback. Let me think, I might add the functions now, it's not that much work.
To bring this in line with the overall restyling, we need to avoid #ifdef.
Done. Anything else?
If I understand correcty, we have here two type of methods: more efficient implementations of existing interface methods, and some completely new methods (*_PRIMITIVE) that are not present in the type-specific Map interface. Am I right?
If I understand correcty, we have here two type of methods: more efficient implementations of existing interface methods, and some completely new methods (*_PRIMITIVE) that are not present in the type-specific Map interface. Am I right?
You are right.
I think it should be easy to lift those method to default interface implementations no?
The rationale here is that probably the methods are fine and there is no need to include hundreds of functions to make them align with the strategy used elsewhere, but it is not a good idea to have methods that appear just in an implementation as people might not get access to them. From what I see, the four _PRIMITIVE methods's code can be very easily lifted up to Map.drv with small changes, and then we would have an backward-compatible interface extension and an optimized implementation for hash-based maps.
Did I miss something?