Tai-e
Tai-e copied to clipboard
Fix for Mahjong's FieldPointsToGraph
Inconsistency
The current implementation of Mahjong is inconsistent with its paper. According to [1], if the field f of an object oi may point to another object oj, then there is an edge from oi to oj with label f added to the FieldPointsToGraph. However, currently Mahjong builds edges only for those fields that are loaded in the analyzed program.
MockObjs
ConcurrentModificationException may be thrown when building the FieldPointsToGraph. In #140 an optimization targeting zero length arrays is introduced by allocating non-functional MockObjs for 0 sized arrays, whose array indices should never point to any objects. Currently FieldPointsToGraph does not check for this situation.
[1] T. Tan, Y. Li, and J. Xue, “Efficient and precise points-to analysis: modeling the heap by merging equivalent automata,” in Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, in PLDI 2017. New York, NY, USA: Association for Computing Machinery, 2017, pp. 278–291. doi: 10.1145/3062341.3062360.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
Project coverage is 75.18%. Comparing base (
21118b5) to head (2a9f8e0).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...l/taie/analysis/pta/PointerAnalysisResultImpl.java | 75.00% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #162 +/- ##
============================================
+ Coverage 72.85% 75.18% +2.32%
- Complexity 4435 4580 +145
============================================
Files 480 480
Lines 15927 15927
Branches 2185 2179 -6
============================================
+ Hits 11603 11974 +371
+ Misses 3467 3080 -387
- Partials 857 873 +16
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
The perceived 'inconsistency' is actually a minor optimization. If a field has never been loaded, there's no need to consider it during object merging; hence, we omitted it from the FPG. Regarding zero-length arrays, as they are non-functional, their indices should not reference any objects. Is it necessary for Mahjong to perform a check on this?
Just force-updated a test in 5d2fa244d4aad6e7557d6e6c67fe34d42bbe423c in this PR to reproduce the bug regarding zero-length arrays, a corresponding stacktrace for this bug can be found here. Note that this bug is non-deterministic, reproducing it might need multiple runs (in the given test we set the number of runs to 100, and caught 1 failure).
Yes, zero-length arrays are non-functional objects and their indices do not point to any objects, so the DefaultSolver will not create pointers for them via CSManager.
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/core/solver/DefaultSolver.java#L452-L455
When FieldPointsToGraph iterates through pointers concurrently, without checking whether the base object is functional, PointerAnalysisResultImpl will invoke csManager.getArrayIndex() to create pointers for zero-length arrays:
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L273-L278
This is explains the ConcurrentModificationException thrown by mahjong: the ptrManager of csManager is updated concurrently for using computeIfAbsent to create new array indices for zero-length arrays, which is a concurrent modification operation.
Incorrect PFG edge
For an array load statement x = y[*], current implementation adds PFG edges from all $$o_i \in pts(y)$$ to all $$o_j \in pts(y[*])$$:
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/toolkit/mahjong/FieldPointsToGraph.java#L57-L65
This is in fact incorrect, since $pts(y[*])=\{ o_j | o_i \in pts(y) \bigwedge o_j \in pts(o_i[*]) \}$, there is no guarantee that all the objects in $pts(y[*])$ will be pointed to by all the objects in $pts(y)$.
This will actually add more edges than needed to the PFG. Consider the following example:
Object[] x = new Object[1]; // o1
Object[] y = new Object[2]; // o2
x[0] = new Object[]; // o3
y[0] = new Object[]; // o4
Object[] z; z = x; z = y;
return z[0]; // ??
For array access z[0], current implementation adds edges from all the objects in pts(z) to all the objects in pts(z[0]). This results in 4 edges (o1 -> o3, o1 -> o4, o2 -> o3, o2 -> o4).
However, o1[*] never points to o4, and no access paths starting from o1 can reach o4. The correct implementation should contain only 2 edges (o1 -> o3, o2 -> o4) in the PFG.
Optimization
As for the optimization, a field (or array index) that is never loaded can still affect the final analysis result through reflection or native APIs (consider the handling of argument passing for Method.invoke(Object[]), where the arguments for method call is implicitly loaded from the argument array, not via explicit array load statements).
Here I compare 2-object sensitive pointer analysis guided by the current mahjong (denoted as Mahjong) and my fix (denoted as Mahjong-Fix) using some of the benchmark programs from https://github.com/pascal-lab/java-benchmarks, and the four precision metrics from mahjong's paper. The result is summarized using the following table. Mahjong-fix is as precise as Mahjong, and has better precision on most of the tested benchmarks for most metrics, while Mahjong seems to have no significant performance advantage over Mahjong-fix (Note that Mahjong's pre-analysis is even slower than Mahjong-fix).
| Program | Analysis | Pre-Time (s) | Time (s) | fail cast | poly call | reach method | call edge |
|---|---|---|---|---|---|---|---|
| fop | Mahjong | 1 | 2 | 322 | 824 | 8088 | 36221 |
| Mahjong-fix | 1 | 3 | 322 | 824 | 8088 | 36214 | |
| luindex | Mahjong | 1 | 3 | 331 | 931 | 7509 | 35514 |
| Mahjong-fix | 1 | 3 | 331 | 931 | 7509 | 35507 | |
| lusearch | Mahjong | 1 | 2 | 344 | 1127 | 8148 | 38287 |
| Mahjong-fix | 1 | 3 | 344 | 1127 | 8148 | 38280 | |
| antlr | Mahjong | 1 | 3 | 446 | 1641 | 8285 | 53097 |
| Mahjong-fix | 1 | 3 | 446 | 1641 | 8285 | 53090 | |
| pmd | Mahjong | 3 | 9 | 1380 | 2545 | 12809 | 64403 |
| Mahjong-fix | 2 | 10 | 1374 | 2543 | 12784 | 64234 | |
| bloat | Mahjong | 2 | 149 | 1221 | 1572 | 9496 | 58433 |
| Mahjong-fix | 1 | 140 | 1221 | 1572 | 9496 | 58426 | |
| chart | Mahjong | 9 | 35 | 1194 | 2005 | 15672 | 74164 |
| Mahjong-fix | 5 | 34 | 1179 | 2001 | 15649 | 74032 | |
| xalan | Mahjong | 4 | 150 | 955 | 2840 | 12506 | 63866 |
| Mahjong-fix | 2 | 149 | 941 | 2835 | 12470 | 63633 | |
| findbugs | Mahjong | 6 | 29 | 1969 | 3580 | 16809 | 89090 |
| Mahjong-fix | 3 | 30 | 1964 | 3578 | 16803 | 88936 |
Great and detailed explanation! Your fix on PFG edge building is well-executed. Regarding the ConcurrentModificationException, I suggest modifying PointerAnalysisResultImpl to reduce the need for clients to be aware of functional/non-functional objects. I've highlighted the methods below that should be adjusted to return empty points-to sets for non-functional base objects:"
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L200-L215
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L218-L231
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L264-L279
https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L282-L295
I have fixed PointerAnalysisResultImpl according to your instructions, getPointsToSet() should now ignore non functional objects.
Could you review my changes, by any chance? @silverbullettt