luajit-lang-toolkit icon indicating copy to clipboard operation
luajit-lang-toolkit copied to clipboard

Add opcode for bytecode version 2

Open ttimasdf opened this issue 7 years ago • 9 comments

To be frank, I'm not familiar with Lua...I'll try to be useful as much😅

When trying to reverse a LuaJIT Dump with version 2 in header, I noticed that luajit-x -bx gave me a different result with luajit. Inspecting the hex dump I found that the opcode is interpreted from version 1, which is 2 slots shift from opcode array. I referenced lj_bc.h and #11 and added the missing opcode. The result was positive. My patched version is at branch v2.1. However, compiled with this patch may not compatible with LuaJIT 2.0 codes. Do you have any solution?

commit d3d9f70bf4c88fbbea39e4219ec150e6e6d40833
Author: ttimasdf <[email protected]>
Date:   Tue Nov 14 18:35:02 2017 +0800

    Add opcode for LuaJIT 2.1 (bytecode version 2)
    
    DO NOT COMPATIBLE WITH 2.0 dumps(bytecode version 1)
    Trying to figure out a way.

diff --git a/lang/bcread.lua b/lang/bcread.lua
index 821fa47..8487421 100644
--- a/lang/bcread.lua
+++ b/lang/bcread.lua
@@ -22,7 +22,7 @@ local BCDUMP = {
 
     -- If you perform *any* kind of private modifications to the bytecode itself
     -- or to the dump format, you *must* set BCDUMP_VERSION to 0x80 or higher.
-    VERSION = 1,
+    VERSION = 2,
 
     -- Compatibility flags.
     F_BE    = 0x01,
@@ -60,6 +60,8 @@ local BCDEF_TAB = {
     {'ISFC', 'dst', 'none', 'var', 'none'},
     {'IST', 'none', 'none', 'var', 'none'},
     {'ISF', 'none', 'none', 'var', 'none'},
+    {'ISTYPE', 'var', 'none', 'lit', 'none'},
+    {'ISNUM', 'var', 'none', 'lit', 'none'},
 
     -- Unary ops.
     {'MOV', 'dst', 'none', 'var', 'none'},
@@ -114,10 +116,12 @@ local BCDEF_TAB = {
     {'TGETV', 'dst', 'var', 'var', 'index'},
     {'TGETS', 'dst', 'var', 'str', 'index'},
     {'TGETB', 'dst', 'var', 'lit', 'index'},
+    {'TGETR', 'dst', 'var', 'var', 'index'},
     {'TSETV', 'var', 'var', 'var', 'newindex'},
     {'TSETS', 'var', 'var', 'str', 'newindex'},
     {'TSETB', 'var', 'var', 'lit', 'newindex'},
     {'TSETM', 'base', 'none', 'num', 'newindex'},
+    {'TSETR', 'var', 'var', 'var', 'newindex'},
 
     -- Calls and vararg handling. T = tail call.
     {'CALLM', 'base', 'lit', 'lit', 'call'},
diff --git a/lang/bcsave.lua b/lang/bcsave.lua
index a70795d..deb9d80 100644
--- a/lang/bcsave.lua
+++ b/lang/bcsave.lua
@@ -584,7 +584,7 @@ local function bc_magic_header(input)
     local f, err = io.open(input, "rb")
     check(f, "cannot open ", err)
     local header = f:read(4)
-    local match = (header == string.char(0x1b, 0x4c, 0x4a, 0x01))
+    local match = (header == string.char(0x1b, 0x4c, 0x4a, 0x02))
     f:close()
     return match
 end

I'm using LuaJIT 2.1.0-beta3, sample byte code dump before patch:

$ luajit-x -bl main
-- BYTECODE -- main:0-0
0001    TGETV    1   0   0
0002    KSHORT   2   1
...
0020    KSHORT   2   1
0021    ITERN    1   1   2
0022    FORL     0 => -32744

-- BYTECODE -- main:0-0
0001    TDUP     0   0
0002    TGETS    0   0   1  ; "__G__TRACKBACK__"
0003    KPRI     0 500
...
0072    TGETS    0   0  19  ; "LAUNCHERPKG"
0073    TGETV    0   0  26
0074    KSHORT   1  30
0075    ITERN    0   2   2
0076    TSETV    0   0  31
0077    ITERN    0   2   1
0078    UNM      1   0
0079    TSETV    0   0  32
0080    ITERN    0   1   2
0081    FORL     0 => -32685

after:

$ luajit-x -bxg main
1b 4c 4a 02             | Header LuaJIT 2.0 BC
02                      | Flags: BCDUMP_F_STRIP
                        | .. prototype ..
b6 01                   | prototype length 182
00                      | prototype flags None
01                      | parameters number 1
05                      | framesize 5
00 08 00 16             | size uv: 0 kgc: 8 kn: 0 bc: 23
                        | .. bytecode ..
36 01 00 00             | 0001    GGET     1   0      ; "print"
27 02 01 00             | 0002    KSTR     2   1      ; "---------------
                        | -------------------------"
42 01 02 01             | 0003    CALL     1   1   2
...
36 01 00 00             | 0019    GGET     1   0      ; "print"
27 02 01 00             | 0020    KSTR     2   1      ; "---------------
                        | -------------------------"
42 01 02 01             | 0021    CALL     1   1   2
4b 00 01 00             | 0022    RET0     0   1
                        | .. uv ..
                        | .. kgc ..
05                      | kgc: ""
0e 74 72 61 63 65 62 61 | kgc: "traceback"
63 6b                   | 
0a 64 65 62 75 67       | kgc: "debug"
06 0a                   | kgc: "\
...
15 5f 5f 47 5f 5f 54 52 | kgc: "__G__TRACKBACK__"
41 43 4b 42 41 43 4b 5f | 
5f                      | 
00                      | kgc: <function: main:0>
                        | .. knum ..
00                      | eof

ttimasdf avatar Nov 14 '17 10:11 ttimasdf

That helped me a lot ! I had JIT 2.1 by default and toolkit as is do not aware of new header for it :(

gotzmann avatar Apr 23 '19 21:04 gotzmann

Hi,

I'm considering adding support for LuaJIT 2.1 bytecode format.

I just need to figure out how to automatically detect the appropriate version so that the toolkit can work with both 2.0 and 2.1 format.

franko avatar Apr 26 '19 17:04 franko

I just need to figure out how to automatically detect the appropriate version so that the toolkit can work with both 2.0 and 2.1 format.

There is a macro BCDUMP_VERSION defined in lj_bcdump.h which is 1 for LuaJIT 2.0.x and 2 for LuaJIT 2.1.x. It's stored in the dump file as the fourth byte after the 0x1b, 0x4c and 0x4a. Anyway, since you generate LuaJIT bytecode you might be interested in this tool: https://github.com/rochus-keller/LjTools; I'm currently working on a simple alternative backend to verify generated bytecode/dumps.

rochus-keller avatar Aug 30 '19 14:08 rochus-keller

Dear Rochus,

thank you for your suggestion. I am not on LuaJIT at the moment but I will add support for 2.1 bytecode format soon or later, it is really simple to do, I think.

Your software LjTools seems remarkable. It actually overlaps a lot with what luajit-lang-toolkit already does, minus the nice user interface. With the luajit-lang-toolkit one just need to use the -bx option if I remember correctly.

franko avatar Aug 30 '19 16:08 franko

Thanks. Personally I'm happy with bytecode version 2.0; maybe someone can point out in what respect 2.1 is supposed to excel 2.0. Concerning my viewer: I'm aware of the command line features and I already used them; but they don't include all information I'm interested in and with the GUI it's easier to cross-reference between bytecode and source code (e.g. double clicking on the bytecode selects the corresponding source line). May I ask whether you or someone else already used your tool to implement another language frontend (i.e. not Lua)?

rochus-keller avatar Aug 30 '19 16:08 rochus-keller

There is one projects using luajit-lang-toolkit, scilua.

I think there is another one that was a lua dialect but I don't remeber its name anymore and it was not an ambitious project.

franko avatar Aug 30 '19 17:08 franko

Thanks, will have a look at it. I wonder why nobody so far used LuaJIT as a backend for a completely different language than Lua - even a statically typed one - which seems to be your original intention to develop this toolkit. Also Titan and Pallene took another route. I personally share your vision but will implement the frontend rather in C++ than in Lua. Nonetheless your toolkit is helpful to better understand the LuaJIT bytecode.

rochus-keller avatar Aug 30 '19 17:08 rochus-keller

I did not actually had ambitions about luajit-lang-toolkit.

I made it to be able to add simple extentions to lua's syntax while using luajit2 unmodified for execution. It was supposed to be used for gsl shell. It was also well fitted for scilua because it needed simple syntax extentions like gsl shell.

I made and published on github because it was a nice work to share but without real ambitions.

I have seen a lot of Lua dialects fails to gain any success and I knew it was meant to be like this.

In addition people loves to hack around luajit so everyone is doing its own modifications and toy projects :-)

franko avatar Aug 30 '19 18:08 franko

LuaJIT is perfect as is, personally I don't see any reason to modify it. But replacing the compiler front end i.e. reusing the backend to implement another language looks like a good idea to me. I thought that was what you intended with your tool.

rochus-keller avatar Aug 30 '19 20:08 rochus-keller