Decompile mistake with nested if/else + and conditions
Description
The following code will decompile incorrectly with pyhon 3.8 :
def toto(a, b, c):
if c:
if a and b:
print('a and b')
print('c, whatever a/b')
return
else:
raise ValueError('not a or not b')
Decompilation result :
def toto(a, b, c):
if c:
if a:
if b:
print('a and b')
print('c, whatever a/b')
return
raise ValueError('not a or not b')
As you can see the print and return are still within a condition instead of being one indentation level up.
How to Reproduce
I attach the examples although they are quite easy to reproduce.
(.env) C:\work\logh_home\decomp-tests>python -m py_compile toto.py
(.env) C:\work\logh_home\decomp-tests>python ..\python-decompile3\decompyle3\bin\decompile.py __pycache__\toto.cpython-38.pyc
# decompyle3 version 3.9.1.dev0
# Python bytecode version base 3.8.0 (3413)
# Decompiled from: Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)]
# Embedded file name: toto.py
# Compiled at: 2023-11-16 12:28:39
# Size of source mod 2**32: 153 bytes
def toto(a, b, c):
if c:
if a:
if b:
print('a and b')
print('c, whatever a/b')
return
raise ValueError('not a or not b')
# okay decompiling __pycache__\toto.cpython-38.pyc
(.env) C:\work\logh_home\decomp-tests>
Environment
(.env) C:\work\logh_home\python-decompile3>git log -1
commit ed375df6b21ecbddd21cf6aaa37c7673da876751 (HEAD -> master, origin/master, origin/HEAD)
Author: rocky <[email protected]>
Date: Mon Nov 13 08:09:43 2023 -0500
Forgot to add these to the last commit
(.env) C:\work\logh_home>pip list
Package Version Location
------------ ---------- -----------------------------------
click 8.1.7
colorama 0.4.6
decompyle3 3.9.1.dev0 c:\work\logh_home\python-decompile3
pip 21.1.2
setuptools 57.0.0
six 1.16.0
spark-parser 1.8.9
wheel 0.36.2
xdis 6.1.0.dev0
WARNING: You are using pip version 21.1.2; however, version 23.3.1 is available.
You should consider upgrading via the 'C:\work\logh_home\decomp-tests\.env\Scripts\python.exe -m pip install --upgrade pip' command.
(.env) C:\work\logh_home>python --version
Python 3.8.10
Windows 10 22H2
Priority
Up to you.
Additional Context
I lost the source code of the binary release of one of my software.
So, I am in the interesting position where I know very well the original source code, and I am still very interested in recovering it fully, and I can spot mistakes.
In this case, I have a state machine for parsing a file, with deep if/else flow.
And of course, I like open source.
Personally, I don't have any innate interest in this bug. Bugs eventually get fixed, but it may be more in the time frame of months or years.
But this code is open source, and you are a programmer; so you or others that may be interested in addressing this particular bug among the many that are there have source code at your disposal as well as git history showing how other bugs were fixed. (Based on past history though, volunteers are few and far between; especially when it comes to fixing problems that do not directly benefit the volunteer.)
If you make donation to the project, I'll look at this particular problem when I have a chance, and others based on the size of the donation. Or feel free to open a bug bounty to pay a programmer to fix this. That works too.
In https://github.com/rocky/python-uncompyle6/discussions/412#discussioncomment-4373597 I describe how this kind of problem is better addressed in not-yet-public work I am doing.
So, I am in the interesting position where I know very well the original source code, and I am still very interested in recovering it fully, and I can spot mistakes.
I have an automated mechanism where I can find bugs and that come the exact source code to compare against.
And of course, I like open source.
Lots of people do. Lots of people love the fact that you get programs that do things you want for free, that it comes with the source code, and you can ask the author or maintainer for free help in addressing something that has a lot of benefit for you and perhaps some residual benefit for others; and that from the author and maintainer you'll get mostly personal help for free.
I'll have a look but I believe this is way above my level of skills. I don't know the python bytecode, and I have never written or used grammars/parsers.
This kind of control-flow I suspect will be handled much easier when in an experimental decompiler I have been working on which classifies basic blocks and understand dominator regions. I will be briefly talking about this at the BlackHat Asia 2024 conference.
I see that pycdc gets confused in the same way as well.