python-uncompyle6 icon indicating copy to clipboard operation
python-uncompyle6 copied to clipboard

Add proper control-flow analysis

Open rocky opened this issue 8 years ago • 2 comments

They uncompyle6 is way way too hacky for determining control flow. Right now, this is the crap done in the "scanner" phase which adds psuedo control-flow opcodes, namely COME_FROM....

We need to do real control flow with basic blocks, interval analysis and control-flow dominators. The result could still be pseudo-ops, but they would be a lot more precise and simplify grammar rules.

rocky avatar May 07 '17 14:05 rocky

For a little context...I assume you mean things like:

We need to do real control flow with basic blocks, interval analysis and control-flow dominators. The result could still be pseudo-ops, but they would be a lot more precise and simplify grammar rules.

I think it would be cool to use control flow graphs (CFGs) for static program analysis other than just reverse engineering used in a decompiler, e.g., software metrics and basic path testing.

I also like the idea of switching to using symbolic execution for building control flow graphs over parsing for building abstract syntax trees (ASTs). ASTs are great for front-end compiling from source but less so for front-end decompiling from bytecode.

Uzume avatar Jul 23 '17 12:07 Uzume

The most quickest and most effective way to make sure this happens is for you to work on it or get involved in the various projects.

But here's some background on where things stand as of mid July, 2017.

Somewhat by accident, the non-trivial ground work to make this happen has been done. Basically this is in the xdis project. It handles python bytecode instructions, opcode classification, loading and writing python bytecode files. And in a cross Python version way.

The code I have right now to handle control flow is in the python control flow github project Right now it makes a flow control graph and gives graphics output for that using dot/vizgraph. In doing that I noticed a lot of dead code, absolute jumps to jumps and so on.

The basic blocks have additional flags that give characteristics of the block, e.g the start of a loop, an exception block and so on. The next step would be to augment the bytecode with pseudo opcodes marking to facilitate recognition by a grammar.

It was at that point I realized, I also had the basics in order to write a python bytecode assembler very easily. And this now makes it possible to remove the junk I saw. Here's the funny thing, the decompiler currently relies on that junk to be there for detection.

However this is no reason not to allow it, and the decompiler should just get better. This will occur after the flow control code adds the hints for the decompiler.

rocky avatar Jul 24 '17 01:07 rocky