python-decompile3 icon indicating copy to clipboard operation
python-decompile3 copied to clipboard

Python 3.9 .. 3.12...

Open rocky opened this issue 3 years ago • 22 comments

3.9 and 3.10, 3.11, 3.12 are out; 3.13 will be out at some point.

Personally, if I get interested in this it would be using a new project that does redoes control flow analysis in a more basic and reliable way. This would require a more full-time dedication to this effort rather than as a unpaid hobby as all of my work has been so far. If I see more than $2K $5K[^1] in sponsorship (see https://github.com/rocky/python-uncompyle6/issues/331 for the exact details), I'll start working on this in another project. $5K while it may seem large, is less than 3% of my paid job per year, and it is less than $2.00 per uncompyle6 "like".

[^1]: This changes for each duplicate issue by $25, so see the bottom of the thread for the exact amount.

That said, this issue is in case someone else wants to adapt this code for 3.9 .. 3.12.

Sadly, for me and these projects, open-source and free software seems to have become others expressing what they want, but not contributing to, other than describing what they want, possibly with pleas and urgency.

Edit in 2022: Happy New Year.

The base prices has gone up to reflect the amount of work needed and lack of serious public involvement by anyone in any of the projects other than myself. Don't get me wrong - I don't mind working on this on my own and at my own whim. It is just that if I set a figure on committing to get this done by me, it needs to be more commensurate with that amount time and effort I'd need to spend.

The good news is that it does look like an evolution of these projects along the way described will work. I haven't looked at Python 3.9 in detail from the side of decompilation, but I have for 3.10. As best as I can tell its code generation has gotten better. It now seems to move/hoist code and can leaving dead code. Nothing insurmountable, just more work.

It also looks like the simplest update path is to first to revise 3.8 which I have started and that is going well. Then 3.9, then 3.10, and so on. Since for me this is all additional work per version that $5K is for 3.9, and then something like the same for 3.10, and the same for 3.11. (I realize this is may all be kind of moot since it's likely there will be no takers for any of this. But I do need to cover myself in the unlikely event that there is someone interested.)

The other good news is that public facing code and docs will be getting better over time in case others would like to do this publically.

Valentine's Day 2022 Edit:

As I've said, one thing that has been demoralizing has been the ratio of beggars[^1] verus providers of help. So for every issue raised here with a "volunteer wanted to fix" tag and a different person helping and solving the other person's problem, I'll lower the barrier for me to start providing 3.9 or 3.10 decomplation by $50 per solved problem.

Late 2023 Edit:

While I now have, some Python 3.8 .. 3.10 bytecode decompiling for code fragments like comprehensions, lambdas, and simple statements, work still progresses a bit slowly. Public funding has virtually dropped off and never was that significant. This code will probably stay private for a while longer. If you need something urgently and are willing to pay a significant amount for hand decompilation (on the order of $500-1K), you can contact me.

[^1]: A beggar is someone who posts first with a problem without making an effort to solve their problem or read instructions beforehand, if ever. As with many beggars you might get a hard luck story about how programming is so difficult, or that begging is okay because, hey, they are a newcomer and that that they might help if only someone would teach them how to program or read instructions, inform them about what can be google'd or is already known or spoon feed them what's been described already. Occasionally, you'll get the attitude that just the act of communicating the beggar's desire it is such a great service to everyone that this should be sufficient public service, and that the beggar's time is so valuable that it is not worth their time and effort to do more in the way of helping.

rocky avatar Oct 11 '20 02:10 rocky

When will there be one for 3.9 pls ?

LPWCq avatar Aug 09 '21 10:08 LPWCq

@LPWCq This is something that you have control over.

You can contribute to fund the project. On the other hand, the more people like you whine about this here without doing anything else, the higher the minimum funding level becomes. The minimum limit is now $2,025.

rocky avatar Aug 09 '21 12:08 rocky

Some status, since I am not sure where else to note this. (I will probably clean this up later and put it in a wiki).

I have looked more at 3.10 bytecode generation and had bits and pieces of that working. It uses better control-flow information up front. It seems to work, but I stopped working on that because the code generation gap between 3.8 and 3.10 is too great.

So I have been redoing 3.8 with an eye towards really simplifying, clarifying and cleaning up the grammar and using more precise control flow information. In doing this 3.8 code decompilation accuracy, which was weak before, has gotten much better.

Oddly, though I find that I don't really need the control flow and dominator information that much (which is why this can work reasonably well without it).

But having the control flow and dominator information around in the instructions really helps understand what should be where when things go wrong and it helps in coming up with the grammar. And there are a few places where having this information around speeds up checking. So it is really nice to have, and essential in some rare occasions.

In refactoring, another goal is adding even more modularity. And this includes grammar modularity which is important. Separating the sub-grammar for what can appear in a lambda helps. Note that lambda's do pull in all kinds of comprehensions and function calls, and string format interpolation. So just that is pretty beefy. But they do not include assignment statements, compound statements which includes try blocks, with blocks, function definitions other than lambdas, or loop constructs.

It will be possible to decompile any code object whether that is a complete module/function or not. In particular lambda, and comprehensions fall into this kind of complete code unit that generally doesn't stand by itself.

Really down the line one might imagine being able to specify just a particular dominator region and a particular kind of expected object it should be, e.g. simple statement, compound-statement, etc.

3.10 has definitely gotten better and more sophisticated. But the good news again, is there a straightforward path to get this that doesn't require originality other than cleaning things up and what has been described above.

I am now of the opinion though that current Python code generation is at a way different level than it was back in 2.x and even mid 3.x. I would be surprised if the kind of thinking in say pycdc, unpyc3, unpyc37 or even this code as it was in the 2.7 and 3.3 days could be extended short of the kind of overhaul here.

Not much has been written about how the other decompilers (or early versions of this one) worked so it is hard to determine the bigger picture and ideas. But my understanding of unpyc and pycdc is that they are instruction-centric and based on an opcode you go prowling around looking for what it refers to to see what the opcode can be a part of. It has the feel of the kinds of things a symbolic executor might do. Although I do think that symbolic interpretation can help decompilation, these aren't truly organized as symbolic interpreters.

I have written at great length about how this decompiler works here and I think it is applicable to decompilers for other dynamic high-level programming languages as well.

rocky avatar Jan 16 '22 16:01 rocky

hi rocky, I am willing to pay, how can I contact you

Zunea avatar Feb 17 '22 13:02 Zunea

And with #89 we are at $5025 now.

rocky avatar Mar 02 '22 19:03 rocky

And with https://github.com/rocky/python-uncompyle6/issues/400 we are at $5025 - $80 (in sponsors for two months) + $25 = $4970 now.

rocky avatar May 07 '22 19:05 rocky

$4970 + 2 x $25 = $5020 (May 22 this will decrease by about $120.) $5020 - $120 = $4900

rocky avatar May 21 '22 16:05 rocky

$4970 + 2 x $25 = $5020 (May 22 this will decrease by about $120.) $5020 - $120 = $4900

Does this price still stand? What is included in this price? What Python versions will this include? Would it include updates to python-control-flow, or would it rather be a "hacky" fix?

Svenskithesource avatar Feb 05 '23 15:02 Svenskithesource

Sponsorship has falling off. I haven't done a detailed calculation. $5K still stands.

Detailed status of work done yesterday is here.

Would it include updates to python-control-flow, or would it rather be a "hacky" fix?

python-control-flow

What Python versions will this include?

Initially 3.9. (Well actually also 3.8 since that gives me a bridge from decompyle3 that I can compare from).

I did a little bit of experimenting with 3.10 and although that changes things in details a little, right now I am not aware of substantive changes that can't be solved with the control-flow approach.
For 3.11, I haven't looked at in much detail, but I imagine that that one is going to many changes from 3.10 grammar. There are more opcodes, and call protocols change yet again. (Or what that in 3.10 or maybe 3.10 and 3.11 - I just can't keep track of how many times this happens.)

This stuff is annoying, but straightforward, if sometimes tedious.

rocky avatar Feb 05 '23 18:02 rocky

Awesome, are there any time estimates that you can give for the multiple milestone?

Svenskithesource avatar Feb 05 '23 19:02 Svenskithesource

No time estimates, because I don't have any. I have a full-time job that pays way more than any of this stuff.

When I see evidence of serious intent of payment, then things may change.

rocky avatar Feb 05 '23 19:02 rocky

Understandable, the preferred way of payment is through GitHub sponsoring?

Svenskithesource avatar Feb 06 '23 07:02 Svenskithesource

the preferred way of payment is through GitHub sponsoring?

Yes.

I will also say that there definitely is an order in which things are done and that is simple enough to explain. These would be the "milestones".

In the newer versions of the decompilers, I have the ability decode smaller code units independent of the entire program. The natural code units are: lambdas, and comprehensions of various kinds. Run decompyle3-code --help for a breakout.

The grammar for lambdas is a subset of the full Python grammar. But in lambdas, most of the control flow problems exist. In fact the grammar for the lambdas has to include the entire set for comprehensions.

In real Python programs though lambdas are pretty small. So these make an ideal subset to start from and that is what I have started from. The next step is to come up with small simple examples that cover everything. I have always had this to some extent. See for example https://github.com/rocky/python-decompile3/tree/master/test/simple_source/comprehension

Over time, I have come to realize how to make these smaller and better so that the next change of bytecode I can better go through a staged set of examples from simple to complex. For example in decompyle3 comprehensions are broken out, but not lambdas. That is corrected in the next iteration. This test organization may eventually get backported to decompyle3 (and then uncompyle6).

Like all things, there is a mountain of work that could be spent on these projects, but no one seems to want to volunteer to do it. So I do it from time to time as the mood hits me.

Once I have some basic decompiling lambdas for the most simple stuff, then I switch to comprehensions that contain these. Although comprehensions are pretty much the same, I generally start with sets and generators.

The reason for this partly has to do with the fact that in real programs these tend to be rare. And why does this matter? When I think I am complete with, say, set generators. Then I can basically scan for these in the entire collection of Python programs I have on my disk. Just the one package scipy and all of the packages it pulls in will give me thousands of test programs that are small, testable and isolated. From these I can probably get 100 or generators from real code.

At any rate, when generators are done, then set comprehensions, ... lambdas. Then the big switch to the full Python grammar happens and I try code subroutines, and then full Python programs.

So those are the milestones. In the past, I have no time estimates because I never had the need for this. Time estimates is not something that is in of itself useful to me like, say, breaking the tasks down above is.

rocky avatar Feb 06 '23 13:02 rocky

There is one other thing that has been in the back of my mind. Recently I came across this project to do faster Python execution that is asking for funding. The figure given there is $500K. That is two orders of magnitude greater than the $5K mentioned here. And while $500K is large, $5K is in fact a bit low when it comes my actual labor cost - that is what I get at my day job . Using consulting rates of about $100/hr this is only 50 hours or a single week. In reality it would take me a month to get something basic and another month to get it more complete. There is a long tail on this.

See https://news.ycombinator.com/item?id=24837309 for discussion on funding the Python compiler project.

And I note that there are several projects that provide faster Python implementations, PyPy and Pyston, but there are others out there as well. In stark contrast, there is no serious decompiler for 3.9, 3.10, and 3.11. The other projects that "aim" to support decompilation by taking an older incomplete decompiler and pointing it at 3.9-3.11 bytecode I consider more whimsical than serious.

rocky avatar Feb 06 '23 13:02 rocky

I have been thinking about this recently. This is tagged as volunteer needed, and you have expressed willingness to accept PRs multiple times. Are you still open to this idea?

I would love to contribute (and I've started working on a 3.9 PoC one weekend), but adding a support for a completely new Python version is a big undertaking. I also have trouble wrapping my head around some idioms found in the decompiler. Can I (or other people) send you PRs "incrementally" (starting from a broken decompiler basically copied from 3.8, and improving/fixing it on the go)? Or should I try to get a complete 3.9 decompiler in my fork, and send it in one PR (I doubt I'll make it - if that's one month for you, that's much more for me). Or are there any other ways for external contributors to start working on this?

I ask, because I like contributing to OS projects that I use, and would gladly help. On the other hand, it's shocking that a tool that I've used professionally in 3 out of my previous 4 jobs can't even get $5k in funding. OS sponsorship truly is broken.

msm-code avatar Feb 06 '23 15:02 msm-code

I am looking for contributions and contributors on all of the open source projects, including this one.

The problem is usually a mismatch between what is offered versus what is needed. In any of the open source projects that I manage, if someone can take off a piece and do it reasonably well and put in a PR - awesome! Please do it.

Here is an isolated self-contained problem mention for 3.8, but it is equally valid for 3.9: rocky/python-decompile3#105 So if you want to improve the 3.9 decompiler do it by improving the 3.8 decompiler. (And I suspect none of the other decompilers that purport to support Python veresions 3.8 and later handle this.)

With regards to understanding idioms, I am not sure exactly what you mean, but I've had to do the same thing.

For myself, either you get used to it or find a way how to do it differently. There is an aspect here that isn't going to change in that you need to be able to map sequences of instructions and pseudo instructions into grammar rules. That kind of idiom isn't going to change for example.

So this is largely a case of get used to it. Here is a borderline case: I this were done in a better compiler framework some of the tedium between matching rules and semantic actions would be reduced and the mistakes eliminated. But since I am essentially the only one working the project it doesn't seem like a good use of my time. Or find another compiler framework and rewrite code to match that . I'll live with what I have. The thing works.

As for the 3.9 PoC, it is your choice and your right to work on it and make it available and invite others to work on it. I know however that this kind of thing where you take a decompiler for version x then tell it to accept version y has been tried numerous times. There is a version of unpy37 for 3.10. And pycdc sort of takes this approach. When I started with uncompyle6 no doubt I did the same things. But I put in the hard work to really get it to work. This meant for example writing the cross-version disassembler for example. I find few people willing to do the really hard work of understanding what's up and then adapting, and even doing major refactoring.

So finally as for helping out with the publicly unpublished version. The problem here is that there is a bit of knowledge and experience that is needed and experimentation. As I said, for myself, I've had to revise ideas 3 or 4 times.

Right now I don't think anyone else can help here unless that person as a lot of experience with such things. And if such a person exists, that person is probably getting paid lavishly for this kind of rare, detailed, specialized, and tedious work - and doesn't have time or desire to do volunteer to do more of this kind of thing.

For me, I've been in that camp of let's just try to hack around the existing code too long. And I think that there is a better way. That is why I've been doing the rewrites. And it has taken me 3 or 4 iterations on this until I have something that seems reasonable. Each prior iteration had some good parts to it, but had flaws. Right now I haven't come across the flaws in what I am doing, but if that happens I expect it will happen a little ways down the line.

rocky avatar Feb 07 '23 23:02 rocky

Thanks for your detailed and thoughtful response.

Here is a problem mention for 3.8, but it is equally valid for 3.9: https://github.com/rocky/python-decompile3/issues/105 So if you want to improve the 3.9 decompiler do it by improving the 3.8 decompiler.

To be clear, my main hope is to help with 3.9+ decompilers. For my purposes I don't even need the decompiled code to work, just be more readable than the bytecode. But I have no plans of (as you've said it) forking this code and hacking around to make it work. I don't think this will work long-term, so I'd much prefer to have my changes megred upstream.

Nevertheless, thanks for pointing me to that issue - it looks like a great place to start, and will be useful immediately.

With regards to understanding idioms, I am not sure exactly what you mean, but I've had to do the same thing.

Things like this or BNF rules used in various places.

So this is largely a case of get used to it. Here is a borderline case: I this were done in a better compiler framework some of the tedium between matching rules and semantic actions would be reduced and the mistakes eliminated. But since I am essentially the only one working the project it doesn't seem like a good use of my time. Or find another compiler framework and rewrite code to match that . I'll live with what I have. The thing works.

I don't have a problem with the code. It works and I don't think it's bad. It's just not immediately obvious what needs to be changed when working on the code. I roughly understand how they work, but this requires a bit of effort initially to wrap my head around. I was wondering if maybe there's a documentation of the high-level code structure (or how to add a simple feature)?

Anyway I'll manage. Just pointed out before that (in my opinion, in comparison to some other projects) the codebase has a steep initial learning curve.

Right now I don't think anyone else can help here unless that person as a lot of experience with such things. And if that were the case they probably are already doing that work as a paid job somewhere.

Let's start with #105, I'm happy to work on it (after hours of my paid job). Let's see if I can pull it off.

msm-code avatar Feb 09 '23 11:02 msm-code

To be clear, my main hope is to help with 3.9+ decompilers. For my purposes I don't even need the decompiled code to work, just be more readable than the bytecode.

Have you looked at the options -F extended and -F extend -bytes options to pydisasm?

Nevertheless, thanks for pointing me to that issue - it looks like a great place to start, and will be useful immediately.

Thanks!

Things like this or BNF rules used in various places.

... I roughly understand how they work, but this requires a bit of effort initially to wrap my head around. I was wondering if maybe there's a documentation of the high-level code structure (or how to add a simple feature)?

Actually, yes, Have you looked at How does this code work?, Table-driven semantic actions, and the Fixing Issue #x wiki pages?

From my perspective, a good deal has been written about this. Especially as I can't recall any feedback on any of what I've written so far.

in comparison to some other projects) the codebase has a steep initial learning curve.

Have you ever worked on a compiler, specifically the grammar portion?

I'm happy to work on it (after hours of my paid job). Let's see if I can pull it off.

Great - thanks! (I have a paid job too).

rocky avatar Feb 09 '23 13:02 rocky

Actually, yes, Have you looked at How does this code work?, Table-driven semantic actions, and the Fixing Issue #x wiki pages?

No, I completely missed that (I cloned the repository and looked for the docs in the repo itself. Missed the link hidden in "see also" in the readline). That's actually quite comprehensive and makes a lot of my comments irrelavant. Thanks.

Have you ever worked on a compiler, specifically the grammar portion?

Yes (one of my main projects is a database with a simple query grammar, I've created a toy compiler for a c-like language, and sent a few PRs with grammar fixes for another OS project). As I've said, I'll manage - especially after it turns out that the code is, in fact, documented.

msm-code avatar Feb 09 '23 19:02 msm-code

On the other hand, it's shocking that a tool that I've used professionally in 3 out of my previous 4 jobs can't even get $5k in funding. OS sponsorship truly is broken.

I used to be a musician, and this reminds me of how difficult it was to try to make a living as a musician. Hmm... maybe that prepared me for Open-source development.

Fortunately, I don't have to rely on Open-Source sponsorship. However, unavoidably I am aware of the various options and freelance programming options. In the history of Open-Source development (I have been doing this for a while), fairly recently there has been more effort to try to fund development. And I am aware of these things and will mention them below, for others who may want to follow in this increasingly tough path...

Before getting into true open-source work there is something related called "freelancing" which may or may not be open source.

Bounties

There are various services that allow bounties for certain projects. I am sure bounties have been offered for Python 3.9+ decompilers. The problem here is that I am not aware of any bounty offered that is comparable to the amount of effort needed. Some things take a lot of development and need very specialized knowledge. Imagine a chip manufacturer offering a bounty for their next level CPU design. (If someone can find such a thing, I would be very amused and curious to learn about it, and especially the bounty fee offered).

I don't know how people make a living off of doing bounty programming. I suspect you have to be a certain kind of generalist and quick thinker and coder - which I am not. I am reminded of the days when I worked in the financial markets and we had these System Administrators (is that term still used instead of "devops"?) whose job it was to fix any problem a stock trader had within 5-15 minutes.

Contract work

I recall somewhere when rms was asked how open-source developers could support themselves developing while still developing GPL software, the solution he offered was doing support and contract work for improving open-source features.

Recently, I have posted contracting rates. Honestly, I do this more not for the money I get from it. In fact the fee I have to pay for the service that handles the calendaring and payment about covers the very rare consulting I do.

I do this more to deter people who feel like the first line of attack for solving their specific problem that is likely to be of interest to noone but themselves is to contact the author. As soon as you suggest this is something that might be paid for (and in any other line of work other than open source would be expected), then magically, those who would seek help from the authors and maintainers suddenly find themselves more motivated to try to solve their problems on their own. And they are less pushy, demanding and tolerant when they do ask for help.

Technical and Security Support Assurance

On some of my open-source projects I hook into TideLift. From my standpoint, once you have your project sponsored and do the initial work to get things up to specification, the monthly fee every month is nice. It is small though. But it is sort of a lottery here. Of the 100 or so open-source projects over a couple of decades only one falls into this category. And for me it is $50/month.

Hackfests, Scholarship programs, e.g. Google Summer of Code

Periodically, these things come up. And I am delighted when one of these smiles on code I have written. A small percentage of the time the code offered is not usable without work if that. And while that may be good for the code, for the developer, if you are lucky it just saves time writing code that you would have otherwise may have had to write.

Or not. Sometimes the feature offered is not something you directly care about, but instead others somewhere sometime might.

Google Summer of Code is Awesome! But keep in mind while the Student selected may get $5K and the umbrella organization may get money as well, the mentor - that is the person who is dealing and guiding the student gets about nothing monetarily. Well, okay, I got a couple of T-shirts promoting the program for the two years I did this.

And this is a once a year kind of lottery thing where your open-source project (if it is part of a large umbrella non-profit organization) may get accepted by on average it is more likely not to be accepted.

Donations

This one has probably been biggest monetary payout. The github donations mechanism has been pretty awesome and painless. Donations are generally unreliable and small.

Someone also supports the project through librepay at something like an initial donation of $50 and one penny ($0.01) a week. Yes, it is small but as the doner says, it is a sign of appreciation. And honestly, I appreciate it.

Awards

Here, I haven't been that successful. When I worked at IBM Research around Christmastime my manager awarded me the prestigious "littlest snowman" award. The task in the group was to design a snowman using some graphics package and I used something POSIX package to create a pixelated PNG. And also at IBM, the group I was in that developed code for IBM's first RISC: the POWER architecture, we each got a "dinner for two at McDonalds" award.

And then there was the time there was a competition to write a Perl program to do something novel. I think I got second place there and again something on the order of $50.

I will close this section by saying this kind of thing is also not a sure kind of thing.

Conclusion - with respect to decompilation

I think the decompilation approach (in its revised form) is novel. Using decompilation at runtime to give precise location is also pretty novel - am not aware of a Python debugger other than mine that can do this, and probably it is rare overall (if it exists). However if this kind of thing is every publically appreciated other than thank you's here and there, it will likely come after I am dead.

And I am coming to appreciate Groucho Marx's quip when some one asked him about doing something for posterity:

What has posterity ever done for me?

That said, I will probably continue to work on all of this in my spare time. However it sometimes competes with solitaire games, or other puzzle activities and leisure kinds of things. (I like to listen to music and when I do I often am completely absorbed in it, rather than it be a background kind of listening.)

rocky avatar Feb 12 '23 15:02 rocky

Version 3.9 is not supported. what should i do bro ! i have an pyc 3.9 version

engkhalid96 avatar Feb 26 '23 14:02 engkhalid96

I understand you need a volunteer to salvage the Python decompiler effort.

I am 60 years old, a retired software engineer going bonkers from boredom. I was laid off in 2005 and have not been able to recover my career. I've been blackballed due to effects of autism. I play with projects while keeping house these days.

Once upon a time I had even designed a compiler and debugger suite for a byte-code engine.

I think your project is worthy for improvement. I wanted to use Trepan on Blender for a personal project but found it blocked due to uncompile6 not able to work past Python 3.8. I need Python3.10.

Since I'm not going anywhere, I'd rather fix uncompile6 than donate money I don't have as I'm on a fixed income. I can help about 10 hours a week at most due to SSDI work restrictions.

Please contact me.

IronAutie

[email protected]

IronAutie avatar Oct 24 '23 16:10 IronAutie