SieveFuzz icon indicating copy to clipboard operation
SieveFuzz copied to clipboard

Use of points-to analysis (PTA) to get indirect edges

Open acidghost opened this issue 2 years ago • 3 comments

First, really cool work! :)

I was experimenting with your prototype and found out that it takes a really long time and memory to compute the graphs in the server component (SVF) for some large real-world programs, regardless of the use of the --get-indirect flag.

Digging into the code I see that regardless of the --get-indirect flag it's computing a full pointer analysis: https://github.com/HexHive/SieveFuzz/blob/1751673ed6c56b7dc69b71ef07ace49867e3cfa4/patches/svf/svf-ex.cpp#L101-L113

For reference, see SVF's code:

  • createAndersenWaveDiff calls analyze: https://github.com/SVF-tools/SVF/blob/a99ee34ed34a67ce72f028ca9dbd8005b5463d05/include/WPA/Andersen.h#L427-L436
  • Andersen::analyze: https://github.com/SVF-tools/SVF/blob/a99ee34ed34a67ce72f028ca9dbd8005b5463d05/lib/WPA/Andersen.cpp#L106-L128

Hence, regardless of the --get-indirect flag, SieveFuzz is using a call graph augmented with the indirect edges found by PTA.

In case the --get-indirect flag is given, it will also add the indirect edges from the PTA to the ICFG.

Given that the paper does not discuss the use of PTA, I was wondering if the intended use of SieveFuzz (i.e. what is evaluated in the paper) is with or without PTA and the --get-indirect flag.

acidghost avatar Aug 16 '23 16:08 acidghost

Thanks a lot for your interest and sorry for the late reply! As you correctly pointed out, we do use a PTA callgraph and then add those edges to the ICFG. This was intended as an optimization/ease-of-implementation tactic where instead of creating new edges from scratch in the ICFG upon being observed dynamically we instead would follow through on indirect edges overlaid on top of this ICFG during our reachability analysis only if we had seen that indirect edge dynamically before. For the purposes of the evaluation, we had the PTA callgraph along with the --get-indirect flag turned on for all targets. The only exception was mJS where we turned the --get-indirect flag turned off because the version of SVF we used at the time would segfault trying to overlay the indirect call edges onto the ICFG.

Let me know if you have any further questions.

prashast avatar Aug 24 '23 09:08 prashast

@prashast Thanks for the explanation.

Unfortunately I've not been able to run PTA on some targets from the MAGMA dataset. PHP goes out-of-memory on a machine with 128GB of RAM and others (e.g., openssl, sqlite, etc.) consume tens of GBs making it impossible to run a decent amount of instances in parallel (for evaluation purposes).

Would it be possible to patch the prototype and not use PTA but add the indirect edges to the graphs as they're discovered?

acidghost avatar Sep 12 '23 16:09 acidghost

Also, the version of SVF used in this prototype is quite old. Newer version haven't changed the API much but have lots of improvements and bug fixes.

acidghost avatar Sep 12 '23 17:09 acidghost