python-uncompyle6
python-uncompyle6 copied to clipboard
Python 3.7 decompiling full 3.7.3 library without error
Description
$ pyenv local 3.7.3
$ cd test
$ ./test_pyenvlib.py --3.7.3 --max 2200 --verify
Remove 36 or so parser errors.
@x0ret as with the others, if this is not of interest or too much work, feel free to remove yourself.
Background
Tests
File "/Users/x0ret/Desktop/dev/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1448, in print_super_classes3
if i == start:
UnboundLocalError: local variable 'i' referenced before assignment
start = n-2
for i in range(start, 0, -1):
if not node[i].kind in ['expr', 'call', 'LOAD_CLASSNAME']:
break
pass
if i == start:
return
else:
- start = n-2
+ i = start = n-2
for i in range(start, 0, -1):
if not node[i].kind in ['expr', 'call', 'LOAD_CLASSNAME']:
Need more investigation. I'm working on it.
@rocky i was investigating python3.7 datetime issue and noticed the issue is another chained compare, this is the the test case:
def f():
if (month is None and
isinstance(year, (bytes, str)) and len(year) == 4 and
1 <= ord(year[2:3]) <= 12):
return 5
finally this is the diff:
diff --git a/uncompyle6/parsers/parse37.py b/uncompyle6/parsers/parse37.py
index 1d2baa7e..25c6fd28 100644
--- a/uncompyle6/parsers/parse37.py
+++ b/uncompyle6/parsers/parse37.py
@@ -99,6 +99,8 @@ class Python37Parser(Python36Parser):
compare_chained1_false_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
compare_chained2c_37 POP_TOP JUMP_FORWARD COME_FROM
+ compare_chained1_false_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
+ compare_chained2d_37 POP_TOP JUMP_FORWARD COME_FROM
compare_chained2_false_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP POP_JUMP_IF_FALSE
compare_chained2a_false_37 ELSE POP_TOP JUMP_BACK COME_FROM
@@ -110,6 +112,8 @@ class Python37Parser(Python36Parser):
compare_chained2c_37 ::= expr DUP_TOP ROT_THREE COMPARE_OP come_from_opt POP_JUMP_IF_FALSE
compare_chained2a_false_37 ELSE
+ compare_chained2d_37 ::= expr COMPARE_OP _come_froms POP_JUMP_IF_FALSE JUMP_FORWARD ELSE
+
jf_cfs ::= JUMP_FORWARD come_froms
ifelsestmt ::= testexpr c_stmts_opt jf_cfs else_suite opt_come_from_except
diff --git a/uncompyle6/semantics/customize37.py b/uncompyle6/semantics/customize37.py
index 71d84dc4..2d9dd26f 100644
--- a/uncompyle6/semantics/customize37.py
+++ b/uncompyle6/semantics/customize37.py
@@ -62,6 +62,9 @@ def customize_for_version37(self, version):
(0, 19 ) ),
'compare_chained2c_37': (
'%[3]{pattr.replace("-", " ")} %p %p', (0, 19), (6, 19) ),
+ 'compare_chained2d_37': (
+ '%[1]{pattr.replace("-", " ")} %p',
+ (0, 19) ),
'if_exp_37a': ( '%p if %p else %p', (1, 'expr', 27), (0, 27), (4, 'expr', 27) ),
'if_exp_37b': ( '%p if %p else %p', (2, 'expr', 27), (0, 'expr', 27), (5, 'expr', 27) ),
I'm not sure this diff is your favorite, anyway as you can see compare_chained2d_37 is just compare_chained2b_37 with accept multiple COME_FROM. maybe we can fix just 2b.
Please review this and guide me to edit it to fit in your style.
Thank you
Thanks for the information. It looks to me like uncompyle6 is putting in COME_FROM to the wrong locations. Here is the offending bytecode reported:
def f--- This code section failed: ---
2 0 LOAD_GLOBAL 'month'
2 LOAD_CONST None
4 COMPARE_OP 'is'
6 POP_JUMP_IF_FALSE 74 'to 74'
3 8 LOAD_GLOBAL 'isinstance'
10 LOAD_GLOBAL 'year'
12 LOAD_GLOBAL 'bytes'
14 LOAD_GLOBAL 'str'
16 BUILD_TUPLE_2 2 ''
18 CALL_FUNCTION_2 2 ''
20 POP_JUMP_IF_FALSE 74 'to 74'
22 LOAD_GLOBAL 'len'
24 LOAD_GLOBAL 'year'
26 CALL_FUNCTION_1 1 ''
28 LOAD_CONST 4 4
30 COMPARE_OP '=='
32 POP_JUMP_IF_FALSE 74 'to 74'
4 34 LOAD_CONST 1 1
36 LOAD_GLOBAL 'ord'
38 LOAD_GLOBAL 'year'
40 LOAD_CONST 2 2
42 LOAD_CONST 3 3
44 BUILD_SLICE_2 2 ''
46 BINARY_SUBSCR
48 CALL_FUNCTION_1 1 ''
50 DUP_TOP
52 ROT_THREE
54 COMPARE_OP '<='
56 POP_JUMP_IF_FALSE 66 'to 66'
58 LOAD_CONST 12 12
60 COMPARE_OP '<='
62_0 COME_FROM 32 '32'
62_1 COME_FROM 20 '20'
62_2 COME_FROM 6 '6'
62 POP_JUMP_IF_FALSE 74 'to 74'
64 JUMP_FORWARD 70 'to 70'
66 ELSE '70'
66 POP_TOP
68 JUMP_FORWARD 74 'to 74'
70_0 COME_FROM 64 '64'
5 70 LOAD_CONST 5 5
72 RETURN_VALUE
74_0 COME_FROM 68 '68'
74_1 COME_FROM 62 '62'
The jumps from offsets 32, 20, and 6 should be to offset 74, not offset 62. If the COME_FROMs were fixed, I wonder if this would work.
Wouldn't this be a better fix? Your thoughts?
You are right, I didn't check for that. Honestly, I have no clue where this issue comes from. Could you please guide me (scope) to go for it?
This is in scanners/scanner3.py, and specifically function find_jump_targets(). It returns a dictionary called targets which function ingest() uses to insert the COME_FROMs.
Although the detect_control_flow() is a big pile of crap and should be removed, making sure that that getting the targets of jumps point to the right offset shouldn't be so bad to fix.
Ok, the issue is this line:
@@ -811,7 +814,8 @@ class Scanner3(Scanner):
self.fixed_jumps[offset] = fix or match[-1]
return
else:
- self.fixed_jumps[offset] = match[-1]
+ self.fixed_jumps[offset] = target
return
# op == POP_JUMP_IF_TRUE
According to the condition:
# If we still have any offsets in set, start working on it
if match:
is_jump_forward = self.is_jump_forward(pre_rtarget)
if (is_jump_forward and pre_rtarget not in self.stmts and
self.restrict_to_parent(self.get_target(pre_rtarget), parent) == rtarget):
is_jump_forward is False so else is executing. and then we have:
else:
self.fixed_jumps[offset] = match[-1]
however match is [6, 20, 32, 62] and always we have 62. instead based on fix we have target=74 and disasm will be like:
64 JUMP_FORWARD 70 'to 70'
66 ELSE '70'
66 POP_TOP
68 JUMP_FORWARD 74 'to 74'
70_0 COME_FROM 64 '64'
5 70 LOAD_CONST 5 5
72 RETURN_VALUE
74_0 COME_FROM 68 '68'
74_1 COME_FROM 62 '62'
74_2 COME_FROM 32 '32'
74_3 COME_FROM 20 '20'
74_4 COME_FROM 6 '6'
74 LOAD_CONST None
76 RETURN_VALUE
but next we have another issue which i'm working on it:
def f():
if month is None:
if isinstance(year, (bytes, str)):
if len(year) == 4:
pass
if not 1 <= ord(year[2:3]) <= 12:
return 5
EDIT: This diff breaks test cases
I am sorry the code is such a mess. (Compare with https://github.com/Mysterie/uncompyle2/blob/master/uncompyle2/scanner27.py#L326-L580 which is where this is derived from and corresponding uncompyle2's and you'll see it was always a mess)
Yes, I saw it breaks the tests.
But we should change the rules so that they don't have to be followed by a (sometimes bogus) "COME_FROM" (because the jump in 3.7 can sometimes go directly back to a loop iterator).
I have come to learn from painful mistakes that we have to understand what's going on and represent things the way they are. If the code for "and" sometimes does not have a COME_FROM at the end, uncompyle6 shouldn't be trying to add some sort of phony "COME_FROM" as may have been done in the past.
Instead all of this code needs an overhaul and replaced by something which is on a more sound ground: control flow analysis which understands the boundaries of loops and of blocks and nested blocks.
By the way, if desired, there could be a reduce check added when the "and" rule doesn't have a "come from" because the jump is backwards.
It occurs to me that if fixing up all the past sins is to arduous, for now you can just have the change you proposed apply for 3.7+ (self.version >= 3.7 or whatever) and over the weekend I can look at going back and fixing old old 3.x grammar rules that would be broken as a result of doing the right thing.
@rocky, Please check this issue, i couldn't fix it up.
the issue is in recognizing or since the current grammer does not handle the python 3.7 case. precisely, the case is .pyenv/versions/3.7.3/lib/python3.7/html/__pycache__/__init__.cpython-37.opt-1.pyc or for better integration we have following test case:
def f():
if 0xD800 <= num <= 0xDFFF or num > 0x10FFFF:
return 5
return 4
disasm:
2 0 LOAD_CONST 55296 55296
2 LOAD_GLOBAL 'num'
4 DUP_TOP
6 ROT_THREE
8 COMPARE_OP '<='
10 POP_JUMP_IF_FALSE 20 'to 20'
12 LOAD_CONST 57343 57343
14 COMPARE_OP '<='
16 POP_JUMP_IF_TRUE 30 'to 30'
18 JUMP_FORWARD 22 'to 22'
20 ELSE '22'
20 POP_TOP
22_0 COME_FROM 18 '18'
22 LOAD_GLOBAL 'num'
24 LOAD_CONST 1114111 1114111
26 COMPARE_OP '>'
28_0 COME_FROM 16 '16'
28 POP_JUMP_IF_FALSE 34 'to 34'
3 30 LOAD_CONST 5 5
32 RETURN_END_IF
34_0 COME_FROM 28 '28'
4 34 LOAD_CONST 4 4
36 RETURN_VALUE
-1 RETURN_LAST
As of previous grammer for or there are POP_JUMP_IF_TRUE:
or ::= expr jmp_true expr COME_FROM (18)
but for 3.7 we don't. I test multiple thing but couldn't fix. Could you please check?
In last attempt, just 22 cases of all 2055 failed in python 3.7 library.
In last attempt, just 22 cases of all 2055 failed in python 3.7 library.
You are awesome!
The problem here is I bet another one of those dishonest COME_FROM's
16 POP_JUMP_IF_TRUE 30 'to 30'
...
26 COMPARE_OP '>'
28_0 COME_FROM 16 '16'
So if you can find the place in that horrible scanner3 routine that is mucking that up and remove it for 3.7 and later (or maybe 3.6 as well) then I bet this would go away or we might have to adjust some rules, but that's okay.
I am sorry I got into this mess. At the time this stuff crept in, I didn't have
- the better grammar tracing and error display we now have
- reduction checks
- the better tree display tools,
- the grammars for the various versions were not as isolated, and
- I didn't even have the ability to add comments to the grammar.
Nor did I understand what was going on as well.
Now that all of this has changed, there is this mess to clean up.
I really like the idea of setting a totally new decompiler project just for a single modern version like 3.7 (and later 3.7 and up) which has a stripped down grammar and semantic actions. And uses python-flow-control. I can drop the restriction of writing in a Python2-compatible way (because uncompyle6 will always be around). This can also be in its own github organization if that helps.
So if you can find the place in that horrible scanner3 routine that is mucking that up and remove it for 3.7 and later (or maybe 3.6 as well) then I bet this would go away or we might have to adjust some rules, but that's okay.
You were right, i failed twice! : ).
@@ -753,8 +754,10 @@ class Scanner3(Scanner):
# FIXME: this is not accurate The commented out below
# is what it should be. However grammar rules right now
# assume the incorrect offsets.
- # self.fixed_jumps[offset] = target
- self.fixed_jumps[offset] = pretarget.offset
+ if self.version < 3.6:
+ self.fixed_jumps[offset] = pretarget.offset
+ else:
+ self.fixed_jumps[offset] = target
self.structs.append({'type': 'and/or',
'start': start,
'end': pretarget.offset})
This needs a little rule adjustment, i'm working on it.
I am sorry I got into this mess. At the time this stuff crept in, I didn't have
You shouldn't be. This project is awesome, believe me.
I really like the idea of setting a totally new decompiler project just for a single modern version like 3.7 (and later 3.7 and up) which has a stripped down grammar and semantic actions. And uses python-flow-control. I can drop the restriction of writing in a Python2-compatible way (because uncompyle6 will always be around). This can also be in its own github organization if that helps.
So we should go for it, let's do this for a while and see the result. Concentrating on one version and ability to use modern language features. Good idea.
Thanks.