Use of points-to analysis (PTA) to get indirect edges
First, really cool work! :)
I was experimenting with your prototype and found out that it takes a really long time and memory to compute the graphs in the server component (SVF) for some large real-world programs, regardless of the use of the --get-indirect flag.
Digging into the code I see that regardless of the --get-indirect flag it's computing a full pointer analysis: https://github.com/HexHive/SieveFuzz/blob/1751673ed6c56b7dc69b71ef07ace49867e3cfa4/patches/svf/svf-ex.cpp#L101-L113
For reference, see SVF's code:
createAndersenWaveDiffcallsanalyze: https://github.com/SVF-tools/SVF/blob/a99ee34ed34a67ce72f028ca9dbd8005b5463d05/include/WPA/Andersen.h#L427-L436Andersen::analyze: https://github.com/SVF-tools/SVF/blob/a99ee34ed34a67ce72f028ca9dbd8005b5463d05/lib/WPA/Andersen.cpp#L106-L128
Hence, regardless of the --get-indirect flag, SieveFuzz is using a call graph augmented with the indirect edges found by PTA.
In case the --get-indirect flag is given, it will also add the indirect edges from the PTA to the ICFG.
Given that the paper does not discuss the use of PTA, I was wondering if the intended use of SieveFuzz (i.e. what is evaluated in the paper) is with or without PTA and the --get-indirect flag.
Thanks a lot for your interest and sorry for the late reply! As you correctly pointed out, we do use a PTA callgraph and then add those edges to the ICFG. This was intended as an optimization/ease-of-implementation tactic where instead of creating new edges from scratch in the ICFG upon being observed dynamically we instead would follow through on indirect edges overlaid on top of this ICFG during our reachability analysis only if we had seen that indirect edge dynamically before. For the purposes of the evaluation, we had the PTA callgraph along with the --get-indirect flag turned on for all targets. The only exception was mJS where we turned the --get-indirect flag turned off because the version of SVF we used at the time would segfault trying to overlay the indirect call edges onto the ICFG.
Let me know if you have any further questions.
@prashast Thanks for the explanation.
Unfortunately I've not been able to run PTA on some targets from the MAGMA dataset. PHP goes out-of-memory on a machine with 128GB of RAM and others (e.g., openssl, sqlite, etc.) consume tens of GBs making it impossible to run a decent amount of instances in parallel (for evaluation purposes).
Would it be possible to patch the prototype and not use PTA but add the indirect edges to the graphs as they're discovered?
Also, the version of SVF used in this prototype is quite old. Newer version haven't changed the API much but have lots of improvements and bug fixes.