pflua
pflua copied to clipboard
Improve generated assembly for pflang filters
The generated assembly is not ideal. For example, for a loop reading packets from a pcap-format savefile and matching them against "tcp port 5555", we have the function:
return function(P,length)
if not (24 <= length) then return false end
local v1 = ffi.cast("uint16_t*", P+12)[0]
if not (v1 == 8) then goto L2 end
do
local v2 = P[23]
if not (v2 == 6) then return false end
do
local v3 = ffi.cast("uint16_t*", P+20)[0]
local v4 = bit.band(v3,65311)
if not (v4 == 0) then return false end
local v5 = P[14]
local v6 = bit.band(v5,15)
local v7 = bit.lshift(v6,2)
local v8 = v7+16
if not (v8 <= length) then return false end
local v9 = v7+14
local v10 = ffi.cast("uint16_t*", P+v9)[0]
if v10 == 45845 then return true end
do
local v11 = v7+18
if not (v11 <= length) then return false end
local v12 = ffi.cast("uint16_t*", P+v8)[0]
do return v12 == 45845 end
end
end
end
::L2::
do
if not (56 <= length) then return false end
if not (v1 == 56710) then return false end
do
local v13 = P[20]
if v13 == 6 then goto L5 end
do
if not (v13 == 44) then return false end
do
local v14 = P[54]
if not (v14 == 6) then return false end
end
end
end
::L5::
do
if not (v1 == 56710) then return false end
local v15 = P[20]
if v15 == 6 then goto L9 end
do
if not (v15 == 44) then return false end
do
local v16 = P[54]
if v16 == 6 then goto L9 end
do
if not (v16 == 6) then return false end
end
end
end
::L9::
do
local v17 = ffi.cast("uint16_t*", P+54)[0]
if v17 == 45845 then return true end
do
if not (58 <= length) then return false end
local v18 = ffi.cast("uint16_t*", P+56)[0]
do return v18 == 45845 end
end
end
end
end
end
Here is the IR trace of the loop:
0099 ------ LOOP ------------
0100 > p32 UREFO bench.lua:110 #3
0101 > tab ULOAD 0100
0102 int FLOAD 0101 tab.hmask
0103 > int EQ 0102 +31
0104 p32 FLOAD 0101 tab.node
0105 > p32 HREFK 0104 "cast" @6
0106 > fun HLOAD 0105
0107 > fun EQ 0106 ffi.cast
0108 } cdt CNEWI +186 0095
0109 p64 ADD 0095 +16
0110 } cdt CNEWI +184 0109
0111 p64 ADD 0095 +8
0112 u32 XLOAD 0111
0113 num CONV 0112 num.u32
0114 > num GE 0113 +24
0115 p64 ADD 0095 +28
0116 u16 XLOAD 0115
0117 > int EQ 0116 +8
0118 p64 ADD 0095 +39
0119 u8 XLOAD 0118
0120 > int EQ 0119 +6
0121 p64 ADD 0095 +36
0122 u16 XLOAD 0121
0123 int BAND 0122 +65311
0124 > int EQ 0123 +0
0125 p64 ADD 0095 +30
0126 u8 XLOAD 0125
0127 int BAND 0126 +15
0128 int BSHL 0126 +2
0129 int BAND 0128 +60
0130 > int ADDOV 0129 +16
0131 num CONV 0130 num.int
0132 > num LE 0131 0113
0133 > int ADDOV 0129 +14
0134 i64 CONV 0133 i64.int sext
0135 p64 ADD 0134 0109
0136 u16 XLOAD 0135
0137 > int NE 0136 +45845
0138 > int ADDOV 0129 +18
0139 num CONV 0138 num.int
0140 > num LE 0139 0113
0141 i64 CONV 0130 i64.int sext
0142 p64 ADD 0141 0109
0143 u16 XLOAD 0142
0144 > int EQ 0143 +45845
0145 + num ADD 0092 +1
0146 + num ADD 0094 +1
0147 + p64 ADD 0112 0109
0148 }+ cdt CNEWI +184 0147
0149 > p64 ULT 0147 +140160267072216
0150 } cdt PHI 0096 0148
0151 p64 PHI 0095 0147
0152 num PHI 0092 0145
0153 num PHI 0094 0146
Anything having type "num" in the loop is suboptimal -- all of the types can be proven to be integers. All the mucking about with tables and ffis and checks and such are also unnecessary. As such we see that the body of the loop has lots of memory accesses and floating point comparisons:
->LOOP:
0bca231f mov ebx, [0x41c452e0]
0bca2326 cmp dword [rbx+0x4], -0x0c
0bca232a jnz 0x0bca0054 ->17
0bca2330 mov ebx, [rbx]
0bca2332 cmp dword [rbx+0x1c], +0x1f
0bca2336 jnz 0x0bca0054 ->17
0bca233c mov ebx, [rbx+0x14]
0bca233f mov rdi, 0xfffffffb41c74b80
0bca2349 cmp rdi, [rbx+0x98]
0bca2350 jnz 0x0bca0054 ->17
0bca2356 cmp dword [rbx+0x94], -0x09
0bca235d jnz 0x0bca0054 ->17
0bca2363 cmp dword [rbx+0x90], 0x41c84058
0bca236d jnz 0x0bca0054 ->17
0bca2373 mov r15, rbp
0bca2376 add rbp, +0x10
0bca237a mov ebx, [r15+0x8]
0bca237e xorps xmm5, xmm5
0bca2381 cvtsi2sd xmm5, rbx
0bca2386 ucomisd xmm5, xmm1
0bca238a jb 0x0bca0058 ->18
0bca2390 movzx r14d, word [r15+0x1c]
0bca2395 cmp r14d, +0x08
0bca2399 jnz 0x0bca005c ->19
0bca239f movzx r13d, byte [r15+0x27]
0bca23a4 cmp r13d, +0x06
0bca23a8 jnz 0x0bca0060 ->20
0bca23ae movzx r12d, word [r15+0x24]
0bca23b3 mov edi, r12d
0bca23b6 and edi, 0xff1f
0bca23bc jnz 0x0bca0064 ->21
0bca23c2 movzx esi, byte [r15+0x1e]
0bca23c7 mov edx, esi
0bca23c9 and edx, +0x0f
0bca23cc mov ecx, esi
0bca23ce shl ecx, 0x02
0bca23d1 and ecx, +0x3c
0bca23d4 mov r11d, ecx
0bca23d7 add r11d, +0x10
0bca23db jo 0x0bca0068 ->22
0bca23e1 xorps xmm4, xmm4
0bca23e4 cvtsi2sd xmm4, r11d
0bca23e9 ucomisd xmm5, xmm4
0bca23ed jb 0x0bca006c ->23
0bca23f3 mov r10d, ecx
0bca23f6 add r10d, +0x0e
0bca23fa jo 0x0bca0070 ->24
0bca2400 movsxd rax, r10d
0bca2403 movzx r9d, word [rax+rbp]
0bca2408 cmp r9d, 0xb315
0bca240f jz 0x0bca0074 ->25
0bca2415 mov r8d, ecx
0bca2418 add r8d, +0x12
0bca241c jo 0x0bca0078 ->26
0bca2422 xorps xmm4, xmm4
0bca2425 cvtsi2sd xmm4, r8d
0bca242a ucomisd xmm5, xmm4
0bca242e jb 0x0bca007c ->27
0bca2434 movsxd rax, r11d
0bca2437 movzx eax, word [rax+rbp]
0bca243b mov [rsp+0x8], eax
0bca243f mov rax, 0x00007f799aee32d8
0bca2449 cmp dword [rsp+0x8], 0xb315
0bca2451 jnz 0x0bca0080 ->28
0bca2457 addsd xmm6, xmm0
0bca245b addsd xmm7, xmm0
0bca245f add rbp, rbx
0bca2462 cmp rbp, rax
0bca2465 jb 0x0bca231f ->LOOP
0bca246b jmp 0x0bca0088 ->30
---- TRACE 108 stop -> loop
Sub-optimal. To fix this, we can hack on LuaJIT, or look to emit assembly ourselves. I would try the former before the latter, as it's not far off from what needs to happen.
With the new backend from #95, the code looks like:
return function(P,length)
if length < 34 then return false end
local var1 = cast("uint16_t*", P+12)[0]
if var1 == 8 then
if P[23] ~= 6 then return false end
if band(cast("uint16_t*", P+20)[0],65311) ~= 0 then return false end
local var7 = lshift(band(P[14],15),2)
local var8 = (var7 + 16)
if var8 > length then return false end
if cast("uint16_t*", P+(var7 + 14))[0] == 45845 then return true end
if (var7 + 18) > length then return false end
return cast("uint16_t*", P+var8)[0] == 45845
else
if length < 56 then return false end
if var1 ~= 56710 then return false end
local var24 = P[20]
if var24 == 6 then goto L22 end
do
if var24 ~= 44 then return false end
if P[54] == 6 then goto L22 end
return false
end
::L22::
if cast("uint16_t*", P+54)[0] == 45845 then return true end
if length < 58 then return false end
return cast("uint16_t*", P+56)[0] == 45845
end
end
and there are three traces:
---- TRACE 49 start pflua-match:23
0007 TGETV 8 0 7
0008 GSET 8 0 ; "packet"
0009 ADDVN 2 2 0 ; 1
0010 MOV 8 1
0011 GGET 9 0 ; "packet"
0012 TGETS 9 9 0 ; "packet"
0013 GGET 10 0 ; "packet"
0014 TGETS 10 10 1 ; "len"
0015 CALL 8 2 3
0000 . FUNCF 8 ; "tcp port 5555":1
0001 . KSHORT 2 34
0002 . ISGE 1 2
0003 . JMP 2 => 0006
0006 . GGET 2 0 ; "cast"
0007 . KSTR 3 1 ; "uint16_t*"
0008 . ADDVN 4 0 0 ; 12
0000 . . . FUNCC ; ffi.meta.__add
0009 . CALL 2 2 3
0000 . . FUNCC ; ffi.cast
0010 . TGETB 2 2 0
0000 . . . FUNCC ; ffi.meta.__index
0011 . ISNEN 2 1 ; 8
0012 . JMP 3 => 0069
0013 . TGETB 3 0 23
0000 . . . FUNCC ; ffi.meta.__index
0014 . ISEQN 3 2 ; 6
0015 . JMP 3 => 0018
0018 . GGET 3 2 ; "band"
0019 . GGET 4 0 ; "cast"
0020 . KSTR 5 1 ; "uint16_t*"
0021 . ADDVN 6 0 3 ; 20
0000 . . . FUNCC ; ffi.meta.__add
0022 . CALL 4 2 3
0000 . . FUNCC ; ffi.cast
0023 . TGETB 4 4 0
0000 . . . FUNCC ; ffi.meta.__index
0024 . KNUM 5 4 ; 65311
0025 . CALL 3 2 3
0000 . . FUNCC ; bit.band
0026 . ISEQN 3 5 ; 0
0027 . JMP 3 => 0030
0030 . GGET 3 3 ; "lshift"
0031 . GGET 4 2 ; "band"
0032 . TGETB 5 0 14
0000 . . . FUNCC ; ffi.meta.__index
0033 . KSHORT 6 15
0034 . CALL 4 2 3
0000 . . FUNCC ; bit.band
0035 . KSHORT 5 2
0036 . CALL 3 2 3
0000 . . FUNCC ; bit.lshift
0037 . ADDVN 4 3 6 ; 16
0038 . ISGE 1 4
0039 . JMP 5 => 0042
0042 . GGET 5 0 ; "cast"
0043 . KSTR 6 1 ; "uint16_t*"
0044 . ADDVN 7 3 7 ; 14
0045 . ADDVV 7 0 7
0000 . . . FUNCC ; ffi.meta.__add
0046 . CALL 5 2 3
0000 . . FUNCC ; ffi.cast
0047 . TGETB 5 5 0
0000 . . . FUNCC ; ffi.meta.__index
0048 . ISNEN 5 8 ; 45845
0049 . JMP 5 => 0052
0050 . KPRI 5 2
0051 . RET1 5 2
0016 ISF 8
0017 JMP 9 => 0019
0018 ADDVN 3 3 0 ; 1
0019 FORL 4 => 0007
---- TRACE 49 IR
.... SNAP #0 [ ---- ]
0001 rax > int SLOAD #6 CRI
0002 > int LE 0001 +2147483646
0003 rbp int SLOAD #5 CI
0004 rcx > tab SLOAD #1 T
0005 int FLOAD 0004 tab.asize
0006 > p32 ABC 0005 0001
0007 rdx p32 FLOAD 0004 tab.array
0008 p32 AREF 0007 0003
0009 rbx > tab ALOAD 0008
0010 rcx fun SLOAD #0 R
0011 rdi tab FLOAD 0010 func.env
0012 int FLOAD 0011 tab.hmask
0013 > int EQ 0012 +63
0014 rcx p32 FLOAD 0011 tab.node
0015 rcx > p32 HREFK 0014 "packet" @32
0016 tab FLOAD 0011 tab.meta
0017 > tab EQ 0016 [NULL]
0018 tab HSTORE 0015 0009
0019 nil TBAR 0011
.... SNAP #1 [ ---- ---- ---- ---- ---- 0003 0001 ---- 0003 0009 ]
0020 xmm6 > num SLOAD #3 T
0021 xmm6 + num ADD 0020 +1
0022 > fun SLOAD #2 T
0023 int FLOAD 0009 tab.hmask
0024 > int EQ 0023 +1
0025 rdi p32 FLOAD 0009 tab.node
0026 > p32 HREFK 0025 "packet" @1
0027 r8 > cdt HLOAD 0026
0028 > p32 HREFK 0025 "len" @0
0029 xmm2 > num HLOAD 0028
0030 > fun EQ 0022 "tcp port 5555":1
.... SNAP #2 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ]
0031 > num UGE 0029 +34
.... SNAP #3 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 ]
0032 rbx tab FLOAD "tcp port 5555":1 func.env
0033 int FLOAD 0032 tab.hmask
0034 > int EQ 0033 +15
0035 rdi p32 FLOAD 0032 tab.node
0036 > p32 HREFK 0035 "cast" @6
0037 > fun HLOAD 0036
0038 rbx u16 FLOAD 0027 cdata.ctypeid
0039 > int EQ 0038 +181
0040 rbx p64 FLOAD 0027 cdata.ptr
0041 p64 ADD 0040 +12
0043 > fun EQ 0037 ffi.cast
0045 r9 u16 XLOAD 0041
.... SNAP #4 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0046 > int EQ 0045 +8
0047 p64 ADD 0040 +23
0048 u8 XLOAD 0047
.... SNAP #5 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ]
0049 > int EQ 0048 +6
.... SNAP #6 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0050 > p32 HREFK 0035 "band" @15
0051 > fun HLOAD 0050
0052 p64 ADD 0040 +20
0055 r10 u16 XLOAD 0052
0056 > fun EQ 0051 bit.band
0057 int BAND 0055 +65311
.... SNAP #7 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ]
0058 > int EQ 0057 +0
.... SNAP #8 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0059 > p32 HREFK 0035 "lshift" @13
0060 > fun HLOAD 0059
0061 p64 ADD 0040 +14
0062 r10 u8 XLOAD 0061
0064 > fun EQ 0060 bit.lshift
0065 r10 int BSHL 0062 +2
0066 r10 int BAND 0065 +60
0067 r11 > int ADDOV 0066 +16
0068 xmm3 num CONV 0067 num.int
.... SNAP #9 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0069 > num ULE 0068 0029
.... SNAP #10 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 0066 0067 ]
0070 rdi > int ADDOV 0066 +14
0071 rdi i64 CONV 0070 i64.int sext
0072 p64 ADD 0071 0040
0075 rbx u16 XLOAD 0072
.... SNAP #11 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 0066 0067 ]
0076 > int EQ 0075 +45845
.... SNAP #12 [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0077 xmm7 > num SLOAD #4 T
0078 xmm7 + num ADD 0077 +1
0079 rbp + int ADD 0003 +1
.... SNAP #13 [ ---- ---- ---- 0021 0078 ]
0080 > int LE 0079 0001
.... SNAP #14 [ ---- ---- ---- 0021 0078 0079 0001 ---- 0079 ]
0081 ------------ LOOP ------------
0082 p32 AREF 0007 0079
0083 r15 > tab ALOAD 0082
0084 tab HSTORE 0015 0083
.... SNAP #15 [ ---- ---- ---- 0021 0078 0079 0001 ---- 0079 0083 ]
0085 xmm6 + num ADD 0021 +1
0086 int FLOAD 0083 tab.hmask
0087 > int EQ 0086 +1
0088 r14 p32 FLOAD 0083 tab.node
0089 > p32 HREFK 0088 "packet" @1
0090 rbx > cdt HLOAD 0089
0091 > p32 HREFK 0088 "len" @0
0092 xmm5 > num HLOAD 0091
.... SNAP #16 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ]
0093 > num UGE 0092 +34
.... SNAP #17 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 ]
0094 r15 u16 FLOAD 0090 cdata.ctypeid
0095 > int EQ 0094 +181
0096 r12 p64 FLOAD 0090 cdata.ptr
0097 p64 ADD 0096 +12
0098 r15 u16 XLOAD 0097
.... SNAP #18 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 ]
0099 > int EQ 0098 +8
0100 p64 ADD 0096 +23
0101 u8 XLOAD 0100
.... SNAP #19 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ]
0102 > int EQ 0101 +6
0103 p64 ADD 0096 +20
0104 r14 u16 XLOAD 0103
0105 int BAND 0104 +65311
.... SNAP #20 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ]
0106 > int EQ 0105 +0
.... SNAP #21 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 ]
0107 p64 ADD 0096 +14
0108 r14 u8 XLOAD 0107
0109 r14 int BSHL 0108 +2
0110 r14 int BAND 0109 +60
0111 r13 > int ADDOV 0110 +16
0112 xmm4 num CONV 0111 num.int
.... SNAP #22 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0113 > num ULE 0112 0092
.... SNAP #23 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 0110 0111 ]
0114 rdi > int ADDOV 0110 +14
0115 rdi i64 CONV 0114 i64.int sext
0116 p64 ADD 0115 0096
0117 r12 u16 XLOAD 0116
.... SNAP #24 [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 0110 0111 ]
0118 > int EQ 0117 +45845
0119 xmm7 + num ADD 0078 +1
0120 rbp + int ADD 0079 +1
.... SNAP #25 [ ---- ---- ---- 0085 0119 ]
0121 > int LE 0120 0001
0122 rbp int PHI 0079 0120
0123 xmm6 num PHI 0021 0085
0124 xmm7 num PHI 0078 0119
0125 xmm4 nil RENAME 0021 #15
---- TRACE 49 mcode 999
0bcaa8fd mov dword [0x416854a0], 0x31
0bcaa908 mov esi, edx
0bcaa90a movsd xmm1, [0x4159a2c0]
0bcaa913 movsd xmm0, [0x4159a288]
0bcaa91c movsd xmm7, [rsi+0x28]
0bcaa921 cvttsd2si eax, xmm7
0bcaa925 xorps xmm6, xmm6
0bcaa928 cvtsi2sd xmm6, eax
0bcaa92c ucomisd xmm7, xmm6
0bcaa930 jnz 0x0bca0010 ->0
0bcaa936 jpe 0x0bca0010 ->0
0bcaa93c cmp eax, 0x7ffffffe
0bcaa942 jg 0x0bca0010 ->0
0bcaa948 cvtsd2si ebp, [rsi+0x20]
0bcaa94d cmp dword [rsi+0x4], -0x0c
0bcaa951 jnz 0x0bca0010 ->0
0bcaa957 mov ecx, [rsi]
0bcaa959 cmp eax, [rcx+0x18]
0bcaa95c jnb 0x0bca0010 ->0
0bcaa962 mov edx, [rcx+0x8]
0bcaa965 cmp dword [rdx+rbp*8+0x4], -0x0c
0bcaa96a jnz 0x0bca0010 ->0
0bcaa970 mov ebx, [rdx+rbp*8]
0bcaa973 mov ecx, [rsi-0x8]
0bcaa976 mov edi, [rcx+0x8]
0bcaa979 cmp dword [rdi+0x1c], +0x3f
0bcaa97d jnz 0x0bca0010 ->0
0bcaa983 mov ecx, [rdi+0x14]
0bcaa986 mov r15, 0xfffffffb41691100
0bcaa990 cmp r15, [rcx+0x308]
0bcaa997 jnz 0x0bca0010 ->0
0bcaa99d add ecx, 0x300
0bcaa9a3 cmp dword [rdi+0x10], +0x00
0bcaa9a7 jnz 0x0bca0010 ->0
0bcaa9ad mov dword [rcx+0x4], 0xfffffff4
0bcaa9b4 mov [rcx], ebx
0bcaa9b6 test byte [rdi+0x4], 0x4
0bcaa9ba jz 0x0bcaa9d3
0bcaa9bc and byte [rdi+0x4], 0xfb
0bcaa9c0 mov r15d, [0x416853f4]
0bcaa9c8 mov [0x416853f4], edi
0bcaa9cf mov [rdi+0xc], r15d
0bcaa9d3 cmp dword [rsi+0x14], 0xfffeffff
0bcaa9da jnb 0x0bca0014 ->1
0bcaa9e0 movsd xmm6, [rsi+0x10]
0bcaa9e5 addsd xmm6, xmm0
0bcaa9e9 cmp dword [rsi+0xc], -0x09
0bcaa9ed jnz 0x0bca0014 ->1
0bcaa9f3 cmp dword [rbx+0x1c], +0x01
0bcaa9f7 jnz 0x0bca0014 ->1
0bcaa9fd mov edi, [rbx+0x14]
0bcaaa00 mov r15, 0xfffffffb41691100
0bcaaa0a cmp r15, [rdi+0x20]
0bcaaa0e jnz 0x0bca0014 ->1
0bcaaa14 cmp dword [rdi+0x1c], -0x0b
0bcaaa18 jnz 0x0bca0014 ->1
0bcaaa1e mov r8d, [rdi+0x18]
0bcaaa22 mov r15, 0xfffffffb4168a640
0bcaaa2c cmp r15, [rdi+0x8]
0bcaaa30 jnz 0x0bca0014 ->1
0bcaaa36 cmp dword [rdi+0x4], 0xfffeffff
0bcaaa3d jnb 0x0bca0014 ->1
0bcaaa43 movsd xmm2, [rdi]
0bcaaa47 cmp dword [rsi+0x8], 0x4a6f6a60
0bcaaa4e jnz 0x0bca0014 ->1
0bcaaa54 ucomisd xmm1, xmm2
0bcaaa58 ja 0x0bca0018 ->2
0bcaaa5e mov ebx, [0x4a6f6a68]
0bcaaa65 cmp dword [rbx+0x1c], +0x0f
0bcaaa69 jnz 0x0bca001c ->3
0bcaaa6f mov edi, [rbx+0x14]
0bcaaa72 mov rbx, 0xfffffffb41598a80
0bcaaa7c cmp rbx, [rdi+0x98]
0bcaaa83 jnz 0x0bca001c ->3
0bcaaa89 cmp dword [rdi+0x94], -0x09
0bcaaa90 jnz 0x0bca001c ->3
0bcaaa96 movzx ebx, word [r8+0x6]
0bcaaa9b cmp ebx, 0xb5
0bcaaaa1 jnz 0x0bca001c ->3
0bcaaaa7 mov rbx, [r8+0x8]
0bcaaaab cmp dword [rdi+0x90], 0x41598a58
0bcaaab5 jnz 0x0bca001c ->3
0bcaaabb movzx r9d, word [rbx+0xc]
0bcaaac0 cmp r9d, +0x08
0bcaaac4 jnz 0x0bca0020 ->4
0bcaaaca cmp byte [rbx+0x17], 0x6
0bcaaace jnz 0x0bca0024 ->5
0bcaaad4 mov r15, 0xfffffffb4168c128
0bcaaade cmp r15, [rdi+0x170]
0bcaaae5 jnz 0x0bca0028 ->6
0bcaaaeb cmp dword [rdi+0x16c], -0x09
0bcaaaf2 jnz 0x0bca0028 ->6
0bcaaaf8 movzx r10d, word [rbx+0x14]
0bcaaafd cmp dword [rdi+0x168], 0x4168c100
0bcaab07 jnz 0x0bca0028 ->6
0bcaab0d test r10d, 0xff1f
0bcaab14 jnz 0x0bca002c ->7
0bcaab1a mov r15, 0xfffffffb4168bfc0
0bcaab24 cmp r15, [rdi+0x140]
0bcaab2b jnz 0x0bca0030 ->8
0bcaab31 cmp dword [rdi+0x13c], -0x09
0bcaab38 jnz 0x0bca0030 ->8
0bcaab3e movzx r10d, byte [rbx+0xe]
0bcaab43 cmp dword [rdi+0x138], 0x4168bf98
0bcaab4d jnz 0x0bca0030 ->8
0bcaab53 shl r10d, 0x02
0bcaab57 and r10d, +0x3c
0bcaab5b mov r11d, r10d
0bcaab5e add r11d, +0x10
0bcaab62 jo 0x0bca0030 ->8
0bcaab68 xorps xmm3, xmm3
0bcaab6b cvtsi2sd xmm3, r11d
0bcaab70 ucomisd xmm3, xmm2
0bcaab74 ja 0x0bca0034 ->9
0bcaab7a mov edi, r10d
0bcaab7d add edi, +0x0e
0bcaab80 jo 0x0bca0038 ->10
0bcaab86 movsxd rdi, edi
0bcaab89 movzx ebx, word [rdi+rbx]
0bcaab8d cmp ebx, 0xb315
0bcaab93 jnz 0x0bca003c ->11
0bcaab99 cmp dword [rsi+0x1c], 0xfffeffff
0bcaaba0 jnb 0x0bca0040 ->12
0bcaaba6 movsd xmm7, [rsi+0x18]
0bcaabab addsd xmm7, xmm0
0bcaabaf add ebp, +0x01
0bcaabb2 cmp ebp, eax
0bcaabb4 jg 0x0bca0044 ->13
->LOOP:
0bcaabba cmp dword [rdx+rbp*8+0x4], -0x0c
0bcaabbf jnz 0x0bca0048 ->14
0bcaabc5 mov r15d, [rdx+rbp*8]
0bcaabc9 mov dword [rcx+0x4], 0xfffffff4
0bcaabd0 mov [rcx], r15d
0bcaabd3 movaps xmm4, xmm6
0bcaabd6 addsd xmm6, xmm0
0bcaabda cmp dword [r15+0x1c], +0x01
0bcaabdf jnz 0x0bca004c ->15
0bcaabe5 mov r14d, [r15+0x14]
0bcaabe9 mov rdi, 0xfffffffb41691100
0bcaabf3 cmp rdi, [r14+0x20]
0bcaabf7 jnz 0x0bca004c ->15
0bcaabfd cmp dword [r14+0x1c], -0x0b
0bcaac02 jnz 0x0bca004c ->15
0bcaac08 mov ebx, [r14+0x18]
0bcaac0c mov rdi, 0xfffffffb4168a640
0bcaac16 cmp rdi, [r14+0x8]
0bcaac1a jnz 0x0bca004c ->15
0bcaac20 cmp dword [r14+0x4], 0xfffeffff
0bcaac28 jnb 0x0bca004c ->15
0bcaac2e movsd xmm5, [r14]
0bcaac33 ucomisd xmm1, xmm5
0bcaac37 ja 0x0bca0050 ->16
0bcaac3d movzx r15d, word [rbx+0x6]
0bcaac42 cmp r15d, 0xb5
0bcaac49 jnz 0x0bca0054 ->17
0bcaac4f mov r12, [rbx+0x8]
0bcaac53 movzx r15d, word [r12+0xc]
0bcaac59 cmp r15d, +0x08
0bcaac5d jnz 0x0bca0058 ->18
0bcaac63 cmp byte [r12+0x17], 0x6
0bcaac69 jnz 0x0bca005c ->19
0bcaac6f movzx r14d, word [r12+0x14]
0bcaac75 test r14d, 0xff1f
0bcaac7c jnz 0x0bca0060 ->20
0bcaac82 movzx r14d, byte [r12+0xe]
0bcaac88 shl r14d, 0x02
0bcaac8c and r14d, +0x3c
0bcaac90 mov r13d, r14d
0bcaac93 add r13d, +0x10
0bcaac97 jo 0x0bca0064 ->21
0bcaac9d xorps xmm4, xmm4
0bcaaca0 cvtsi2sd xmm4, r13d
0bcaaca5 ucomisd xmm4, xmm5
0bcaaca9 ja 0x0bca0068 ->22
0bcaacaf mov edi, r14d
0bcaacb2 add edi, +0x0e
0bcaacb5 jo 0x0bca006c ->23
0bcaacbb movsxd rdi, edi
0bcaacbe movzx r12d, word [rdi+r12]
0bcaacc3 cmp r12d, 0xb315
0bcaacca jnz 0x0bca0070 ->24
0bcaacd0 addsd xmm7, xmm0
0bcaacd4 add ebp, +0x01
0bcaacd7 cmp ebp, eax
0bcaacd9 jle 0x0bcaabba ->LOOP
0bcaacdf jmp 0x0bca0074 ->25
---- TRACE 49 stop -> loop
---- TRACE 50 start 49/24 "tcp port 5555":11
0052 . ADDVN 5 3 9 ; 18
0053 . ISGE 1 5
0054 . JMP 5 => 0057
0057 . GGET 5 0 ; "cast"
0058 . KSTR 6 1 ; "uint16_t*"
0059 . ADDVV 7 0 4
0000 . . . FUNCC ; ffi.meta.__add
0060 . CALL 5 2 3
0000 . . FUNCC ; ffi.cast
0061 . TGETB 5 5 0
0000 . . . FUNCC ; ffi.meta.__index
0062 . ISEQN 5 8 ; 45845
0063 . JMP 5 => 0066
0066 . KPRI 5 2
0067 . RET1 5 2
0016 ISF 8
0017 JMP 9 => 0019
0018 ADDVN 3 3 0 ; 1
0019 JFORL 4 49
---- TRACE 50 IR
0001 xmm6 num SLOAD #3 PI
0002 xmm7 num SLOAD #4 PI
0003 rbp int SLOAD #5 PI
0004 rax int SLOAD #6 PRI
0005 rbx cdt SLOAD #10 PI
0006 xmm5 num SLOAD #11 PI
0007 r15 u16 SLOAD #12 PI
0008 r14 int SLOAD #13 PI
0009 r13 int SLOAD #14 PI
.... SNAP #0 [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0010 r12 > int ADDOV 0008 +18
0011 xmm4 num CONV 0010 num.int
.... SNAP #1 [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0012 > num ULE 0011 0006
.... SNAP #2 [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0013 r12 tab FLOAD "tcp port 5555":1 func.env
0014 int FLOAD 0013 tab.hmask
0015 > int EQ 0014 +15
0016 rsi p32 FLOAD 0013 tab.node
0017 > p32 HREFK 0016 "cast" @6
0018 > fun HLOAD 0017
0019 r12 u16 FLOAD 0005 cdata.ctypeid
0020 > int EQ 0019 +181
0021 r12 p64 FLOAD 0005 cdata.ptr
0022 rdi i64 CONV 0009 i64.int sext
0023 p64 ADD 0022 0021
0024 {sink} cdt CNEWI +181 0023
0025 > fun EQ 0018 ffi.cast
0026 {sink} cdt CNEWI +184 0023
0027 r12 u16 XLOAD 0023
.... SNAP #3 [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0028 > int EQ 0027 +45845
0029 xmm7 num ADD 0002 +1
0030 rbp int ADD 0003 +1
.... SNAP #4 [ ---- ---- ---- 0001 0029 ]
0031 > int LE 0030 0004
0032 xmm5 num CONV 0030 num.int
.... SNAP #5 [ ---- ---- ---- 0001 0029 0032 0004 ---- 0032 ]
---- TRACE 50 mcode 220
0bcaa81a mov dword [0x416854a0], 0x32
0bcaa825 mov edx, esi
0bcaa827 mov r12d, r14d
0bcaa82a add r12d, +0x12
0bcaa82e jo 0x0bca0010 ->0
0bcaa834 xorps xmm4, xmm4
0bcaa837 cvtsi2sd xmm4, r12d
0bcaa83c ucomisd xmm4, xmm5
0bcaa840 ja 0x0bca0014 ->1
0bcaa846 mov r12d, [0x4a6f6a68]
0bcaa84e cmp dword [r12+0x1c], +0x0f
0bcaa854 jnz 0x0bca0018 ->2
0bcaa85a mov esi, [r12+0x14]
0bcaa85f mov rdi, 0xfffffffb41598a80
0bcaa869 cmp rdi, [rsi+0x98]
0bcaa870 jnz 0x0bca0018 ->2
0bcaa876 cmp dword [rsi+0x94], -0x09
0bcaa87d jnz 0x0bca0018 ->2
0bcaa883 movzx r12d, word [rbx+0x6]
0bcaa888 cmp r12d, 0xb5
0bcaa88f jnz 0x0bca0018 ->2
0bcaa895 mov r12, [rbx+0x8]
0bcaa899 movsxd rdi, r13d
0bcaa89c cmp dword [rsi+0x90], 0x41598a58
0bcaa8a6 jnz 0x0bca0018 ->2
0bcaa8ac movzx r12d, word [rdi+r12]
0bcaa8b1 cmp r12d, 0xb315
0bcaa8b8 jnz 0x0bca001c ->3
0bcaa8be movsd xmm5, [0x4159a288]
0bcaa8c7 addsd xmm7, xmm5
0bcaa8cb add ebp, +0x01
0bcaa8ce cmp ebp, eax
0bcaa8d0 jg 0x0bca0020 ->4
0bcaa8d6 xorps xmm5, xmm5
0bcaa8d9 cvtsi2sd xmm5, ebp
0bcaa8dd movsd [rdx+0x38], xmm5
0bcaa8e2 movsd [rdx+0x20], xmm5
0bcaa8e7 movsd [rdx+0x18], xmm7
0bcaa8ec movsd [rdx+0x10], xmm6
0bcaa8f1 jmp 0x0bcaa8fd
---- TRACE 50 stop -> 49
---- TRACE 51 start 49/11 "tcp port 5555":11
0052 . ADDVN 5 3 9 ; 18
0053 . ISGE 1 5
0054 . JMP 5 => 0057
0057 . GGET 5 0 ; "cast"
0058 . KSTR 6 1 ; "uint16_t*"
0059 . ADDVV 7 0 4
0000 . . . FUNCC ; ffi.meta.__add
0060 . CALL 5 2 3
0000 . . FUNCC ; ffi.cast
0061 . TGETB 5 5 0
0000 . . . FUNCC ; ffi.meta.__index
0062 . ISEQN 5 8 ; 45845
0063 . JMP 5 => 0066
0066 . KPRI 5 2
0067 . RET1 5 2
0016 ISF 8
0017 JMP 9 => 0019
0018 ADDVN 3 3 0 ; 1
0019 JFORL 4 49
---- TRACE 51 IR
0001 xmm6 num SLOAD #3 PI
0002 rbp int SLOAD #5 PI
0003 rax int SLOAD #6 PRI
0004 r8 cdt SLOAD #10 PI
0005 xmm2 num SLOAD #11 PI
0006 r9 u16 SLOAD #12 PI
0007 r10 int SLOAD #13 PI
0008 r11 int SLOAD #14 PI
.... SNAP #0 [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0009 rbx > int ADDOV 0007 +18
0010 xmm7 num CONV 0009 num.int
.... SNAP #1 [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0011 > num ULE 0010 0005
.... SNAP #2 [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0012 rbx tab FLOAD "tcp port 5555":1 func.env
0013 int FLOAD 0012 tab.hmask
0014 > int EQ 0013 +15
0015 r14 p32 FLOAD 0012 tab.node
0016 > p32 HREFK 0015 "cast" @6
0017 > fun HLOAD 0016
0018 rbx u16 FLOAD 0004 cdata.ctypeid
0019 > int EQ 0018 +181
0020 rbx p64 FLOAD 0004 cdata.ptr
0021 r15 i64 CONV 0008 i64.int sext
0022 p64 ADD 0021 0020
0023 {sink} cdt CNEWI +181 0022
0024 > fun EQ 0017 ffi.cast
0025 {sink} cdt CNEWI +184 0022
0026 rbx u16 XLOAD 0022
.... SNAP #3 [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0027 > int EQ 0026 +45845
.... SNAP #4 [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0028 xmm7 > num SLOAD #4 T
0029 xmm7 num ADD 0028 +1
0030 rbp int ADD 0002 +1
.... SNAP #5 [ ---- ---- ---- 0001 0029 ]
0031 > int LE 0030 0003
0032 xmm5 num CONV 0030 num.int
.... SNAP #6 [ ---- ---- ---- 0001 0029 0032 0003 ---- 0032 ]
---- TRACE 51 mcode 232
0bcaa72b mov dword [0x416854a0], 0x33
0bcaa736 mov edx, esi
0bcaa738 movsd xmm5, [0x4159a288]
0bcaa741 mov ebx, r10d
0bcaa744 add ebx, +0x12
0bcaa747 jo 0x0bca0010 ->0
0bcaa74d xorps xmm7, xmm7
0bcaa750 cvtsi2sd xmm7, ebx
0bcaa754 ucomisd xmm7, xmm2
0bcaa758 ja 0x0bca0014 ->1
0bcaa75e mov ebx, [0x4a6f6a68]
0bcaa765 cmp dword [rbx+0x1c], +0x0f
0bcaa769 jnz 0x0bca0018 ->2
0bcaa76f mov r14d, [rbx+0x14]
0bcaa773 mov rdi, 0xfffffffb41598a80
0bcaa77d cmp rdi, [r14+0x98]
0bcaa784 jnz 0x0bca0018 ->2
0bcaa78a cmp dword [r14+0x94], -0x09
0bcaa792 jnz 0x0bca0018 ->2
0bcaa798 movzx ebx, word [r8+0x6]
0bcaa79d cmp ebx, 0xb5
0bcaa7a3 jnz 0x0bca0018 ->2
0bcaa7a9 mov rbx, [r8+0x8]
0bcaa7ad movsxd r15, r11d
0bcaa7b0 cmp dword [r14+0x90], 0x41598a58
0bcaa7bb jnz 0x0bca0018 ->2
0bcaa7c1 movzx ebx, word [r15+rbx]
0bcaa7c6 cmp ebx, 0xb315
0bcaa7cc jnz 0x0bca001c ->3
0bcaa7d2 cmp dword [rdx+0x1c], 0xfffeffff
0bcaa7d9 jnb 0x0bca0020 ->4
0bcaa7df movsd xmm7, [rdx+0x18]
0bcaa7e4 addsd xmm7, xmm5
0bcaa7e8 add ebp, +0x01
0bcaa7eb cmp ebp, eax
0bcaa7ed jg 0x0bca0024 ->5
0bcaa7f3 xorps xmm5, xmm5
0bcaa7f6 cvtsi2sd xmm5, ebp
0bcaa7fa movsd [rdx+0x38], xmm5
0bcaa7ff movsd [rdx+0x20], xmm5
0bcaa804 movsd [rdx+0x18], xmm7
0bcaa809 movsd [rdx+0x10], xmm6
0bcaa80e jmp 0x0bcaa8fd
---- TRACE 51 stop -> 49
The assembly's still kinda trash to be honest. Oh well though, the performance is certainly fine though.