Tai-e icon indicating copy to clipboard operation
Tai-e copied to clipboard

Fix for Mahjong's FieldPointsToGraph

Open jjppp opened this issue 5 months ago • 1 comments
trafficstars

Inconsistency

The current implementation of Mahjong is inconsistent with its paper. According to [1], if the field f of an object oi may point to another object oj, then there is an edge from oi to oj with label f added to the FieldPointsToGraph. However, currently Mahjong builds edges only for those fields that are loaded in the analyzed program.

MockObjs

ConcurrentModificationException may be thrown when building the FieldPointsToGraph. In #140 an optimization targeting zero length arrays is introduced by allocating non-functional MockObjs for 0 sized arrays, whose array indices should never point to any objects. Currently FieldPointsToGraph does not check for this situation.

[1] T. Tan, Y. Li, and J. Xue, “Efficient and precise points-to analysis: modeling the heap by merging equivalent automata,” in Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, in PLDI 2017. New York, NY, USA: Association for Computing Machinery, 2017, pp. 278–291. doi: 10.1145/3062341.3062360.

jjppp avatar Jun 12 '25 09:06 jjppp

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.18%. Comparing base (21118b5) to head (2a9f8e0).

Files with missing lines Patch % Lines
...l/taie/analysis/pta/PointerAnalysisResultImpl.java 75.00% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #162      +/-   ##
============================================
+ Coverage     72.85%   75.18%   +2.32%     
- Complexity     4435     4580     +145     
============================================
  Files           480      480              
  Lines         15927    15927              
  Branches       2185     2179       -6     
============================================
+ Hits          11603    11974     +371     
+ Misses         3467     3080     -387     
- Partials        857      873      +16     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Jun 12 '25 09:06 codecov-commenter

The perceived 'inconsistency' is actually a minor optimization. If a field has never been loaded, there's no need to consider it during object merging; hence, we omitted it from the FPG. Regarding zero-length arrays, as they are non-functional, their indices should not reference any objects. Is it necessary for Mahjong to perform a check on this?

silverbullettt avatar Jul 10 '25 08:07 silverbullettt

Just force-updated a test in 5d2fa244d4aad6e7557d6e6c67fe34d42bbe423c in this PR to reproduce the bug regarding zero-length arrays, a corresponding stacktrace for this bug can be found here. Note that this bug is non-deterministic, reproducing it might need multiple runs (in the given test we set the number of runs to 100, and caught 1 failure).

Yes, zero-length arrays are non-functional objects and their indices do not point to any objects, so the DefaultSolver will not create pointers for them via CSManager. https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/core/solver/DefaultSolver.java#L452-L455 When FieldPointsToGraph iterates through pointers concurrently, without checking whether the base object is functional, PointerAnalysisResultImpl will invoke csManager.getArrayIndex() to create pointers for zero-length arrays: https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L273-L278

This is explains the ConcurrentModificationException thrown by mahjong: the ptrManager of csManager is updated concurrently for using computeIfAbsent to create new array indices for zero-length arrays, which is a concurrent modification operation.

jjppp avatar Jul 16 '25 01:07 jjppp

Incorrect PFG edge

For an array load statement x = y[*], current implementation adds PFG edges from all $$o_i \in pts(y)$$ to all $$o_j \in pts(y[*])$$: https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/toolkit/mahjong/FieldPointsToGraph.java#L57-L65

This is in fact incorrect, since $pts(y[*])=\{ o_j | o_i \in pts(y) \bigwedge o_j \in pts(o_i[*]) \}$, there is no guarantee that all the objects in $pts(y[*])$ will be pointed to by all the objects in $pts(y)$.

This will actually add more edges than needed to the PFG. Consider the following example:

Object[] x = new Object[1]; // o1
Object[] y = new Object[2]; // o2
x[0] = new Object[]; // o3
y[0] = new Object[]; // o4
Object[] z; z = x; z = y;
return z[0]; // ??

For array access z[0], current implementation adds edges from all the objects in pts(z) to all the objects in pts(z[0]). This results in 4 edges (o1 -> o3, o1 -> o4, o2 -> o3, o2 -> o4). However, o1[*] never points to o4, and no access paths starting from o1 can reach o4. The correct implementation should contain only 2 edges (o1 -> o3, o2 -> o4) in the PFG.

Optimization

As for the optimization, a field (or array index) that is never loaded can still affect the final analysis result through reflection or native APIs (consider the handling of argument passing for Method.invoke(Object[]), where the arguments for method call is implicitly loaded from the argument array, not via explicit array load statements).

Here I compare 2-object sensitive pointer analysis guided by the current mahjong (denoted as Mahjong) and my fix (denoted as Mahjong-Fix) using some of the benchmark programs from https://github.com/pascal-lab/java-benchmarks, and the four precision metrics from mahjong's paper. The result is summarized using the following table. Mahjong-fix is as precise as Mahjong, and has better precision on most of the tested benchmarks for most metrics, while Mahjong seems to have no significant performance advantage over Mahjong-fix (Note that Mahjong's pre-analysis is even slower than Mahjong-fix).

Program Analysis Pre-Time (s) Time (s) fail cast poly call reach method call edge
fop Mahjong 1 2 322 824 8088 36221
Mahjong-fix 1 3 322 824 8088 36214
luindex Mahjong 1 3 331 931 7509 35514
Mahjong-fix 1 3 331 931 7509 35507
lusearch Mahjong 1 2 344 1127 8148 38287
Mahjong-fix 1 3 344 1127 8148 38280
antlr Mahjong 1 3 446 1641 8285 53097
Mahjong-fix 1 3 446 1641 8285 53090
pmd Mahjong 3 9 1380 2545 12809 64403
Mahjong-fix 2 10 1374 2543 12784 64234
bloat Mahjong 2 149 1221 1572 9496 58433
Mahjong-fix 1 140 1221 1572 9496 58426
chart Mahjong 9 35 1194 2005 15672 74164
Mahjong-fix 5 34 1179 2001 15649 74032
xalan Mahjong 4 150 955 2840 12506 63866
Mahjong-fix 2 149 941 2835 12470 63633
findbugs Mahjong 6 29 1969 3580 16809 89090
Mahjong-fix 3 30 1964 3578 16803 88936

jjppp avatar Jul 16 '25 02:07 jjppp

Great and detailed explanation! Your fix on PFG edge building is well-executed. Regarding the ConcurrentModificationException, I suggest modifying PointerAnalysisResultImpl to reduce the need for clients to be aware of functional/non-functional objects. I've highlighted the methods below that should be adjusted to return empty points-to sets for non-functional base objects:"

https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L200-L215

https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L218-L231

https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L264-L279

https://github.com/pascal-lab/Tai-e/blob/21118b5d37ebc2e9cf6bffb6cc6c98b74b8ea268/src/main/java/pascal/taie/analysis/pta/PointerAnalysisResultImpl.java#L282-L295

silverbullettt avatar Jul 16 '25 07:07 silverbullettt

I have fixed PointerAnalysisResultImpl according to your instructions, getPointsToSet() should now ignore non functional objects. Could you review my changes, by any chance? @silverbullettt

jjppp avatar Jul 23 '25 05:07 jjppp