Incorrect (or overly conservative) points-to analysis result with two instances (doesn't happen with one)
I am experiencing an incorrect (or overly conservative) result from a points-to analysis that only seems to happen when I am dealing with ore than one object instance. Consider the following code snippet:
Collection<Widget> unorderedWidgets = new HashSet<>();
List<Widget> sortedWidgets = unorderedWidgets.stream().sorted(Comparator.comparing(Widget::getWeight)).collect(Collectors.toList());
I am interested in determining the possible types of the return value of the call to collect() above. The relevant IR looks as follows:
10 v16 = invokeinterface < Application, Ljava/util/stream/Stream, collect(Ljava/util/stream/Collector;)Ljava/lang/Object; > v12,v14 @30 exception:v15(line 25)
For pointer key v16, the concrete types of the instance key of the associated points-to set is correct. Specifically, I am seeing <Primordial,Ljava/util/ArrayList>. So, for one stream instance, it works.
Now, if I add a second stream instance, everything seems to go haywire. For example, suppose we add the code as follows:
Collection<Widget> orderedWidgets = new ArrayList<>();
Set<Double> heavyWidgetWeightSet = orderedWidgets.parallelStream().map(Widget::getWeight).filter(w -> w > 43.2).collect(Collectors.toSet());
And corresponding relevant IR:
24 v33 = invokeinterface < Application, Ljava/util/stream/Stream, collect(Ljava/util/stream/Collector;)Ljava/lang/Object; > v29,v31 @76 exception:v32(line 34)
Now, for v16, I am still seeing <Primordial,Ljava/util/ArrayList>, but I am now also seeing <Primordial,Ljava/util/HashSet>, which actually should be in the points-to set for v33 and not v16.
Is there a bug in the pointer analysis or is this an overapproximation that I can control by varying the context sensitivity? To be clear, for stream instances, I am using kCFA with k=2.
Having a look at the source of the two methods (namely, toSet() and toList()):
in Collectors.java:
/**
* Returns a {@code Collector} that accumulates the input elements into a
* new {@code List}. There are no guarantees on the type, mutability,
* serializability, or thread-safety of the {@code List} returned; if more
* control over the returned {@code List} is required, use {@link #toCollection(Supplier)}.
*
* @param <T> the type of the input elements
* @return a {@code Collector} which collects all the input elements into a
* {@code List}, in encounter order
*/
public static <T>
Collector<T, ?, List<T>> toList() {
return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
/**
* Returns a {@code Collector} that accumulates the input elements into a
* new {@code Set}. There are no guarantees on the type, mutability,
* serializability, or thread-safety of the {@code Set} returned; if more
* control over the returned {@code Set} is required, use
* {@link #toCollection(Supplier)}.
*
* <p>This is an {@link Collector.Characteristics#UNORDERED unordered}
* Collector.
*
* @param <T> the type of the input elements
* @return a {@code Collector} which collects all the input elements into a
* {@code Set}
*/
public static <T>
Collector<T, ?, Set<T>> toSet() {
return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
(left, right) -> { left.addAll(right); return left; },
CH_UNORDERED_ID);
}
Looks like both toList() and toSet() call the same ctor but with different parameters. I wonder if this is a sensitivity issue.
Yup, I haven't dug into this particular issue yet, but in my past experience this kind of thing was almost always an issue with having too little sensitivity around some key call site.
Hm, perhaps setting k=2 for collection instances may help (but of course decrease performance).
No good, it didn't make a difference.
@msridhar Do you have any ideas on the kind of sensitivity that should be adjusted (and where) to avoid such a problem?
Looks like there's a nice in-depth discussion of this in https://github.com/wala/WALA/wiki/Pointer-Analysis.