python-uncompyle6
python-uncompyle6 copied to clipboard
Merge decompyle3's 3.7 and 3.8 goodness into uncompyle6
At this point probably 3.7 and 3.8 decompilation with its refactored parsing and reduction-rule checking is better than the code in uncompyle6.
Although there still is much to do in decompyle3, there is benefit in backporting it here.
Any volunteers?
Hi, After a nearly long time i'm glad i can focus on this project again. Anyway, i managed to backport decompyle3 parsing and reduction-rule checking here. However it seems to me i should review all learned lessons from past : ).
But there seems to be an issue and need to be verified. I tried to run make check-3.7, however some 3.0 tests seem to be fail. Any thing changed? I tried master branch and pip version FYI.
bytecode_3.0/00_import.pyc --
Instruction context:se_lambda.pyc -- decompiled 5 files: 5 okay, 0 failed
19 30 LOAD_CONST 0
33 STORE_FAST 'more'
36 POP_EXCEPT
37 JUMP_BACK 3 'to 3'
-> 40_0 COME_FROM 23 '23'
40 POP_TOP
41 END_FINALLY
42 JUMP_BACK 3 'to 3'
Thank you.
Great to see you back!
A lot has changed. It is hard to even know where to begin.
In decompyle3 I have been exclusively working in the "and-or-refactor" branch. which I am getting ready to merge back into master.
With that,
- lambda expressions are now its own subgrammar of the full python grammar
- directories for each Python version have been added - p37, p38 - for python version 3.7 and 3.8.
- The reduction checks have been greatly expanded But see https://github.com/rocky/python-uncompyle6/wiki/The-Control-Flow-Mess
- grammar patterns for looping constructs are distinct from non-looping constructs
- as the title of that branch suggests and/or and more generally boolean condition treatment has been rethought
- A lot of my testing focuses not just on parsing without errors but running the test suite for the Python version to find semantic errors. It is a lot tougher to do and a lot more stringent
As a result of this, I think now have a very good handle on fixing parsing bugs and being able to have a grammar that is more comprehensible, and understand when something needs to be fixed as a reduction rule or grammar rule or something else.
And all of this I think will be a prelude to yet another fork to really try to reduce the reduction rule complexity by adding more higher-level constructs that can be obtained with basic blocks and dominator, reverse dominator information.
It has and continues to be a long tough slog, but at the end I think I have a much better handle on decompilation for higher-level VMs. If I could get paid for this or there is a market for it, I might do this for some other pair of high-level language and assembly.
Oh, and as for that specific error. Travis CI is not showing it and in my own locally-checked out master I am not seeing that error.
The filename on that is weird but assuming this is supposed to be bytecode_3.0/02_try_except_except.pyc
Here is the tree for that:
whileTruestmt (4): ('%|while True:\n%+%c%-\n\n', 1)
0. L. 15 0 SETUP_LOOP 45 'to 45'
1. l_stmts
try_except (5): ('%|try:\n%+%c%-%c\n\n', 1, 3)
0. L. 16 3 SETUP_EXCEPT 16 'to 16'
1. suite_stmts_opt
suite_stmts
assign (2): ('%|%c = %p\n', -1, (0, 200))
0. expr
L. 17 6 LOAD_CONST 1
1. store
9 STORE_FAST 'more'
2. 12 POP_BLOCK
3. except_handler (4)
0. jmp_abs
13 JUMP_BACK 3 'to 3'
1. 16_0 COME_FROM_EXCEPT 3 '3'
2. except_stmts
except_stmt (2)
0. except_cond1 (7): ('%|except %c:\n', 1)
0. L. 18 16 DUP_TOP
1. expr
17 LOAD_GLOBAL KeyboardInterrupt
2. 20 COMPARE_OP exception-match
3. jmp_false (3)
0. 23 JUMP_IF_FALSE 40 'to 40'
1. _come_froms
2. 26 POP_TOP
4. 27 POP_TOP
5. 28 POP_TOP
6. 29 POP_TOP
1. except_suite (3): ('%+%c%-%C', 0, (1, 9223372036854775807, ''))
0. c_stmts
assign (2): ('%|%c = %p\n', -1, (0, 200))
0. expr
L. 19 30 LOAD_CONST 0
1. store
33 STORE_FAST 'more'
1. 36 POP_EXCEPT
2. jump_except (3)
0. _jump
37 JUMP_BACK 3 'to 3'
1. 40_0 COME_FROM 23 '23'
2. 40 POP_TOP
3. 41 END_FINALLY
4. opt_come_from_except
_come_froms
2. jb_or_c
42 JUMP_BACK 3 'to 3'
3. 45_0 COME_FROM_LOOP 0 '0'
Oh, and as for that specific error. Travis CI is not showing it and in my own locally-checked out master I am not seeing that error.
Okay, let me try again. I'll let you know about the result.
In decompyle3 I have been exclusively working in the "and-or-refactor" branch. which I am getting ready to merge back into master.
So i think i should wait for it to be merged. Correct?
Anyway as of these progresses, what is your priorities in tasks, issues and parts? I think i can adapt myself to the new changes soon. (Maybe this is not a good place to discuss)
directories for each Python version have been added - p37, p38 - for python version 3.7 and 3.8.
And about merging changes, do you have plan to separate each version into directories like decompyle3?
Seems to be a lot of questions. Sorry about that!
So i think i should wait for it to be merged. Correct?
Yes. I don't think it will take too long. In the meantime you can get up to speed on what's up in that branch.
Since this is a largish change, the way I'd handle it is to go over the commit history and pick out specific parts to do.
For example withsmt has been renamed to with to match Python's AST name.
There have been specific bug fixes that aren't related strictly to the grammar change that branch that could be applied to uncompyle. One example I recall is that as a result of adding in the docstring into the Parse Tree the order for where to look for an annotation changed by one.
That sort of weird thing was caught via running one of the stdlib tests for 3.7.
@x0ret
and/or-refactor merging in decompiling may take a while. It looks like more rot was discovered and so yet another detour to fix that up is going on. However...
I've just squashed some of the commits in that branch and gone over things that should be ported here that are not part over the overall big refactor:
It looks like commit sha's have changed. I was able to match things up by commit message though.
~https://github.com/rocky/python-decompile3/b592ebadce93ced49c72feaf37db8440531e9034 Fix bug in finding annotation in fn with docstring~ Done in af8add9d
~https://github.com/rocky/python-decompile3/d6f4017587388952d1460e5eedfdfeab24c2918e
Better Async detection~ Done in e2d349f7 But note more is needed for nested async for's. See
commented-out section in test/simple_source/bug37/02_async_for_generator.py. This works in decompile3.
~https://github.com/rocky/python-decompile3/6097fc320a50a889b9cbdf6376a8673c28ed1257 https://github.com/rocky/python-decompile3/a91f17797cc126b37613758c57bc817a67b73e77 With grammar simplication and withstmt -> with (to match AST With)~
~https://github.com/rocky/python-decompile3/2fd65521feb8d9dc0d629da5501370875385b1f9 Set "await expr" precedence~
~https://github.com/rocky/python-decompile3/4d335816b366373b45d0968e70c16343c00914de Handle remnants of unreachable with~
~https://github.com/rocky/python-decompile3/4c83809a8d6e9f17fd433c4287310455d5702b34 Handle nested async for~
listcomp -> list_comp (to match AST ListComp)
~Remove unneeded Makefiles in light of new "remake -c" option which does the same thing and is more reliable.~
Thanks, you made my work easy!
Do you mind using separate commit for each?
I'll give it a try and let you know about the result.
Thanks, you made my work easy!
You may want to pass judgement until afterwards. I wouldn't say "easy", but hopefully "easier".
Do you mind using separate commit for each?
Please do. And just work in the way that makes the most sense and is the most comfortable for you.
I'll give it a try and let you know about the result.
Many thanks!
About the diffs:
Changes strictly to 3.7-excludes.sh or 3.8-excludes, especially if it is only comments take with a grain of salt. There were regressions as well as progressions. Best is to just run those particular tests to see what is reported. I am sure you know this but as a reminder:
$ cd test/stdlib
$ ./runtest.sh test_[whatever].sh
What is helpful in the comment in the 3.x-exclude is a rough categorization:
- The test doesn't work independent of uncompyle6; subcategories: it might fail or it might take a long time to run.
- The test takes a long time just to decompile. This probably won't happen in uncompyle6, but it happens too often in decompyle3
- There is a parse error in decompilation
- The test takes too long to run the decompiled program. (Probably a control-flow bug)
- The test errors - for example a variable is read that isn't set first, (often a sign of a control-flow-bug)
- The test fails a test assertion.
Also I try to keep track of regressions by placing a newly-excluded test at the top. If this test worked sometime in the past that's great because I can compare the decompilled source which is of course less different than the original source and a faulty decompiled source. So regressions I generally put a the top of the exclude list since those are easiest to fix. And in the case of decompiyle3 once I get all the regressions fixed, I'll merge into master.
I pushed some commits to 3.8-goodness branch and remove corresponding fixed tests from excludes. However please review those if you have time.
Also i saw your changes in commits for list_comp case but you didn't remove that from the list. Any reason?
Changes in that branch:
- backporting
withstmt parser. - backporting stmt reduction and a fix in docstring tranformation.
I'll continue working on this and then send a PR.
Thanks.
I pushed some commits to 3.8-goodness branch and remove corresponding fixed tests from excludes. However please review those if you have time.
Yes, i noticed. Looks fine as far as I could tell form first glance. Nothing fancy. Thanks for doing this.
Also i saw your changes in commits for list_comp case but you didn't remove that from the list. Any reason?
I don't think this is complete. If you grep for n_listcomp you'll see that in there and I think there's some grammar rules with listcomp as well. I got to the stage where I would have to reconcile the two functions n_listcomp() and n_list_comp() and didn't have the energy to do. If you want to pick that up and run with it, great! If not, it can wait.
I don't think this is complete if you grep for n_listcomp you'll see that in there and I think there's some grammar rules with listcomp as well. I got to the stage where I would have to reconcile the two functions n_listcomp() and n_list_comp() and didn't have the energy to do. If you want to pick that up and run with it, great! If not, it can wait.
I spent some time to figure out the listcomp case and i think i'm ready to change all the cases. However i ran into several issues and it seems parser need to be adjusted to cover cases such this:
async def x():
q = [i for x in p async for i in f]
return 1
In the meantime i reviewed pep530 and we should support more cases. 3.7 , 3.8 and decompile3 also is affected.
I'm going to continue this work and fix all the issues.
Anyway about these lines do you remember the test case for this? I assumed listcomp_async is here from previous attempts but the list_iter_index = 5 still is not happen to test cases i checked so far.
Thanks
Thanks for undertaking to clean up listcomp and add pep530. Reminds me of the time you did the same for the function signatures which was a big mess that I doubt I would have gotten around to and fixed properly.
As for list_iter_index in n_listcomp I think what happened is that a redid the grammar to remove the massive duplication and to be able to cover more cases. At this point the assigment has no effect, and the code in comprehension_walk_newer() now has to picks out the right node since the node it needs is down in a child. In the situation where we are dealing with async lists, that parameter is no longer used, although since it is a parameter of the function, something has to go there. Perhaps set to None? or leave whatever value that might happen without the test.
This does bring up the broader point that syncronizing grammar rules with indexes in the semantics directory and the reducecheck directory is becoming a pain and it would be nice to have and use a more automated system to keep these in sync. Or at least more automated checks.
Also, I see that in decompile3 I have two n_list_comp() routines, so I'll be deleting the first (and unused) one which seems to have code more specific to earlier Pythons.