Missed sink with taint propagated in lists and --aliasalgo NONE/LAZY
Please consider the following code:
List myList = new ArrayList<String>();
myList.add(source());
sink(myList); // Reported by FlowDroid
try{System.out.println("");}catch(Exception ex){} // If removed, FlowDroid correctly reports the sink in the next statement
sink(myList); // Not reported by FlowDroid
List unused = new ArrayList<Boolean>(); // If removed, FlowDroid correctly reports the sink in the previous statement
As annotated, FlowDroid doesn't report the second sink. This seems to be related to the try-catch block before it and the variable declaration after it: if we remove one of them, the leak is correctly reported.
I'm using a rather new version of FlowDroid (02dba8a).
This only happens using --aliasalgo NONE or --aliasalgo LAZY, with --aliasalgo FLOWSENSITIVE both sinks are found.
I call FlowDroid via the command line
java -jar ./soot-infoflow-cmd-02dba8a-jar-with-dependencies.jar \
-a {path-to-apk} \
-s ./SourcesAndSinks.xml \
-o ./out.xml \
-p {path-to-android-platforms-folder} \
--aliasalgo NONE \
--mergedexfiles
SourcesAndSinks.xml
<sinkSources>
<category id="NO_CATEGORY" description="no_category">
<method signature="com.example.testapp.MainActivity: java.lang.String source()">
<return type="java.lang.String">
<accessPath isSource="true" isSink="false">
</accessPath>
</return>
</method>
<method signature="com.example.testapp.MainActivity: void sink(java.util.List)">
<param index="0" type="java.util.List">
<accessPath isSource="false" isSink="true"/>
</param>
</method>
</category>
</sinkSources>
That is sorta expected, just because the Java code has no aliases doesn't imply the three-address code has no aliases.
IIRC the issue stems from the typing the three-address code. Dalvik reuses registers for different types (as does the JVM). If you simply assign one TAC-local per register/stack location and then try to reassign types to them, most of the locals would be java.lang.Object and instance method calls would be invalid. Soot therefore has a local splitter pass that tries to separate locals such that precise types can be assigned.
Now, the type assigner can try to infer all the types. Often, this just finds a valid typing and everything is good. However, there is a special case with the specialization of types.
Consider following example:
List lst;
if ... {
lst = new ArrayList();
} else {
lst = new LinkedList();
}
lst.add(...);
For a valid type assignment, we do need a local with the concrete type to call the <init> method on it as well as a local with the supertype for the add call after the branches. In such cases, the IR will look like:
r2: List
r1 = new ArrayList;
r2 = r1
r1.<init>()
goto ...
r3 = new LinkedList;
r2 = r3
r3.<init>
r2.add(...)
And as you can see, you now have aliases in the TAC without aliases in the Java code.
If you look at the TAC for your example:
r6 = new java.util.ArrayList;
r3 = r6;
specialinvoke r6.<java.util.ArrayList: void <init>()>();
$r2 = virtualinvoke r0.<com.example.myapplication.MainActivity: java.lang.String source()>();
interfaceinvoke r6.<java.util.List: boolean add(java.lang.Object)>($r2);
virtualinvoke r0.<com.example.myapplication.MainActivity: void sink(java.util.List)>(r6);
// skipping some irrelevant lines
label4:
virtualinvoke r0.<com.example.myapplication.MainActivity: void sink(java.util.List)>(r3);
you can see the exact same pattern with an alias.
Section 3.3 of the paper Efficient Local Type Inference also talks about this case. As far as I know, Soot still uses the algorithm described in the paper.
@t1mlange This still doesn't explain why the leak is missed. The alias analysis should derive a taint on $r3 from the taint on $r6 at the line r3 = r6. It would make sense to debug into this case.
The fact that the reported leaks depend on whether seemingly irrelevant lines are present or not reminds me of the overly broad check in the strong propagation rule that we fixed a while ago (c8c789d05ec45cf556cac802759b0063f8b11213). The version used by @draftyfrog is newer, though.
@StevenArzt the 2 leaks are found with flow-sensitive aliasing strategy but aren't with --aliasalgo NONE. Haven't looked into the lazy strategy.
For whatever reason, the last line causes the local splitter/type assigner to produce the alias. Without the last line, there is no alias and therefore, the leaks are found regardless of the alias analysis.
I haven't seen the --aliasalgo NONE option. Not finding aliases when there is no aliasing strategy enabled makes perfect sense.