llvmlite
llvmlite copied to clipboard
ObjectFileRef seem's to return wrong section content
While experimenting with llvmlite I found some discrepancies between what is returned by the data()
function of ObjectFileRef
and the real bytes in the file. Here is an example (where I use lief as ground truth but other disassemblers
returns the same):
import llvmlite.binding
import lief
elf = "7f454c4602010100000000000000000001003e000100000000000000000000000000000000000000e0000000000000000000000040" \
"000000000040000500010048c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc80000000000000000000000" \
"00000000000000000000000000002f0000000400f1ff00000000000000000000000000000000070000001200020000000000000000" \
"001f00000000000000002e74657874005f5f617279626f002e6e6f74652e474e552d737461636b002e737472746162002e73796d74" \
"6162003c737472696e673e000000000000000000000000000000000000000000000000000000000000000000000000000000000000" \
"00000000000000000000000000000000000000000000001f0000000300000000000000000000000000000000000000a80000000000" \
"0000380000000000000000000000000000000100000000000000000000000000000001000000010000000600000000000000000000" \
"000000000040000000000000001f000000000000000000000000000000100000000000000000000000000000000f00000001000000" \
"000000000000000000000000000000005f000000000000000000000000000000000000000000000001000000000000000000000000" \
"0000002700000002000000000000000000000000000000000000006000000000000000480000000000000001000000020000000800" \
"0000000000001800000000000000"
raw = bytes.fromhex(elf)
obj = llvmlite.binding.ObjectFileRef.from_data(raw)
for s in obj.sections():
if s.is_text():
print(s.data().hex())
# 48c1e2204809c24889d048c1c03d4831d048b9012000000070706572000000
p = lief.parse(raw)
print(bytes(p.get_section('.text').content).hex())
# 48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8
Maybe I am misunderstanding the way ObjectFileRef works but in that case you would be eager to know what I did wrong.
Looking forward forward for your help. Regards
I forgot to mention that I tried with version 0.34.0 of llvmlite
Indeed, objdump produces similar output to lief:
$ objdump -s test.elf
test.elf: file format elf64-x86-64
Contents of section .text:
0000 48c1e220 4809c248 89d048c1 c03d4831 H.. H..H..H..=H1
0010 d048b901 20000480 00107048 0fafc8 .H.. .....pH...
I think the issue is that the contents are getting treated as a null-terminated string and therefore truncating the section - ffi.lib.LLVMPY_GetSectionContents(self)
is returning a bytes
array of size 21 in SectionIteratorRef.data()
.
With:
diff --git a/llvmlite/binding/object_file.py b/llvmlite/binding/object_file.py
index 106f051..e5961b0 100644
--- a/llvmlite/binding/object_file.py
+++ b/llvmlite/binding/object_file.py
@@ -1,5 +1,6 @@
from llvmlite.binding import ffi
-from ctypes import c_bool, c_char_p, c_size_t, string_at, c_uint64
+from ctypes import (c_bool, c_char_p, c_char, c_size_t, string_at, c_uint64,
+ POINTER)
class SectionIteratorRef(ffi.ObjectRef):
@@ -75,7 +76,7 @@ ffi.lib.LLVMPY_GetSectionAddress.argtypes = [ffi.LLVMSectionIteratorRef]
ffi.lib.LLVMPY_GetSectionAddress.restype = c_uint64
ffi.lib.LLVMPY_GetSectionContents.argtypes = [ffi.LLVMSectionIteratorRef]
-ffi.lib.LLVMPY_GetSectionContents.restype = c_char_p
+ffi.lib.LLVMPY_GetSectionContents.restype = POINTER(c_char)
ffi.lib.LLVMPY_IsSectionText.argtypes = [ffi.LLVMSectionIteratorRef]
ffi.lib.LLVMPY_IsSectionText.restype = c_bool
the reproducer produces:
$ python repro.py
48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8
48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8
Will make a PR shortly.
Fix proposed in #633 - many thanks for the report @RobinDavid!
You're welcome ! Thank you for such reactivity @gmarkall !