ljd
ljd copied to clipboard
decompilation errors
scripts in the test directory were compiled using LuaJIT 2.0.3
output: http://pastebin.com/NZ7Bpz5c
Sound like dump format has changed. That's strange: I'm checking a version tag and it should change in case of any change to the format - either a header change or a new instruction.
I'll try to look into it at the weekends.
or a new instruction
Very unlikely. I have checked all the bc opcodes in the lj_bc.h and instructions.py using a program and a script, looks like you have all the existing opcodes in ljd/bytecode/instructions.py
I will test the decompiler against older versions of luajit these days.
same shit happens with latest luajit git and version 2.0.0 so it's not something related to recent changes in the compiler.
"""OK. I have read some code and compared to luajit source. I have the following concern in the decompiler code. Bytecode array size is read from header but treated as instruction count, the same with upvalues.""" EDIT: sorry my bad, everything is ok so far.
In a long debugging session I have found out that decompiler works fine when bytecode is NOT stripped
Oh, yeah, I forgot about that. Stripped scripts are not tested, because it's kinda useless to decompile them anyway - there will be no local variable names, including upvalues. It will be also very hard to properly dissect complex equations and most of "unwrapping" algorithms will fail without proper local variables.
It might be possible to adapt algorithms to work on stripped sources, but that will require some additional work on the temporary slots elimination pass. It's very hard to understand if slot is just a temporary register or a local variable. Unwarping (a process to reconstruct complex branching and looping statements from a set of linked AST blocks with links named "warps" - i.e. jumps) heavily relies on temporaries being eliminated in prior. And the elimination process heavily depends on making local variables as such, otherwise it will remove too many things, breaking code flow.
It probably might be fixed and, in that case, it may work on unstripped dumps, but that will also require a bit of polishing and debugging everywhere (for instance, you've just found that it's even not able to read stripped dump, as there is no check for a header flag before reading debug information section) AND I doubt it will be able to provide compilable code.
Do you really need to decompile stripped code? AFAIK, most of games (which are like ~60% of all Lua code usage) supply either unstripped or heavily obfuscated dumps. Or even plain sources =). If anyone wants to protect code from being decompiled it's fairly easy actually - you just need to change opcodes. luajit license and code supports that. And if you don't want to do that - stripping the code will only make your live harder as consumers won't be able to report bugs with readable traces.
I guess I may just "fix" this issue by adding a check for a stripped flag and alerting user that stripped dumps are not supported =).
Oh, yeah, I forgot about that.
)-:<
IMO decompiling conditionals and loops shouldn't be too complicated due to the special opodes. Now I see that this (decompilation) isn't a simple task. BTW I fail to see how locals help to decompile branches/loops.
Do you really need to decompile stripped code?
Well I infer that bytecode is stripped from ljd crashes :- ) There are more than 4,5 hundred compiled luajit scripts in this game so I think they compiled it to reduce load time.
I guess I may just "fix" this issue by adding a check for a stripped flag and >alerting user that >stripped dumps are not supported =).
No thanks. What really would help is code comments and documentation. Thanks anyway.
BTW I find this OO code hard to read without comments/docs (not implying anything) so I can't make any valuable contributions to the actual decompiler (anytime soon).
IMO decompiling conditionals and loops shouldn't be too complicated due to the special opodes. Now I see that this (decompilation) isn't a simple task.
Yeah, I thought that too initially =). luajit's bytecode is somewhat ninja-hacky in some ways, so there are LOTS of corner cases. At some moments I've doubted it's even possible. And I'm not the only one - that's the first somewhat working luajit decompiler capable of producing a compilable code since luajit's first release.
The most problematic part is complex expressions reconstruction. As you may see in the TODO list - there are still some problems there. And solving these problems will require a full rewrite once again. Since the beginning of that work I've rewritten basic logic three times at least due to some "tiny issue" showing up a huge misunderstanding in how things work.
I blame overly-minimalistic design of luajit's parser-compiler-optimizer, doing everything at once and hence making some very questionable constructs, which could be made much shorter in a, say, two-phased compiler.
There are more than 4,5 hundred compiled luajit scripts in this game so I think they compiled it to reduce load time.
A compiled script is not the same thing as compiled and stripped =). A debug information is not that big and I doubt it highly increase loading times (I'm talking here from an experience side - we are using python for game scripts and it's much-much more complex and it's pyc files are much bigger in size).
Stripping is usually done either for "security" reasons or due to short-sighted development and luajit's default policy of stripping output files.
No thanks. What really would help is code comments and documentation. Thanks anyway. Yeah. Some of the most complex parts are commented, though - especially unwarping. What's really missing is an overall documentation on how this all works. Most of the parsing and writing code should be obvious - it's somewhat bloaty, but not that hard to understand.
BTW I find this OO code hard to read without comments/docs (not implying anything) so I can't make any valuable contributions to the actual decompiler (anytime soon).
TBH, there is almost no OO code there. The only OO code is the AST tree and it's visitors and some trivial parts of the bytecode parser. Other things are using classes just as structs (as Python don't have structs). Most complex things you should be looking at are very simplistic, using OO only as tree visitors to collect stuff.
But the real problem on the reading side is the Python itself. It's kinda hard to read due to it's dynamic and "oh no, I don't know what the hell I have here exactly" nature. It sucks, yes. But well, it's very good to quickly hacking small stuff. I never imagined the decompiler will grow that big and complex when I've started it. Maybe I'll rewrite it into C/C++ someday if I'll have another motivation strike.
But anyway, if you have any particular questions - feel free to ask them on mail in Russian (it would be much faster to reply in Russian). I'll do my best to answer anything you want to know and will use your questions as a general guideline to maybe write an overall documentation later.
I hope I'll have a chance to get deeper into the code. Python itself is ok, visitors alerted me a bit because I have never used them. AST? ugh I knew I should have read the red dragon book :) EDIT: pls no c++ thank you
Probably fixed by #18
Main issue was the missing flag copy (which resulted in tries to load debug data from actually stripped files).
Look in my fork, this might have saved you some time. I think I made these modifications before.