llvmlite ObjectFileRef seem's to return wrong section content

While experimenting with llvmlite I found some discrepancies between what is returned by the data() function of ObjectFileRef and the real bytes in the file. Here is an example (where I use lief as ground truth but other disassemblers returns the same):

import llvmlite.binding
import lief

elf = "7f454c4602010100000000000000000001003e000100000000000000000000000000000000000000e0000000000000000000000040" \
      "000000000040000500010048c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc80000000000000000000000" \
      "00000000000000000000000000002f0000000400f1ff00000000000000000000000000000000070000001200020000000000000000" \
      "001f00000000000000002e74657874005f5f617279626f002e6e6f74652e474e552d737461636b002e737472746162002e73796d74" \
      "6162003c737472696e673e000000000000000000000000000000000000000000000000000000000000000000000000000000000000" \
      "00000000000000000000000000000000000000000000001f0000000300000000000000000000000000000000000000a80000000000" \
      "0000380000000000000000000000000000000100000000000000000000000000000001000000010000000600000000000000000000" \
      "000000000040000000000000001f000000000000000000000000000000100000000000000000000000000000000f00000001000000" \
      "000000000000000000000000000000005f000000000000000000000000000000000000000000000001000000000000000000000000" \
      "0000002700000002000000000000000000000000000000000000006000000000000000480000000000000001000000020000000800" \
      "0000000000001800000000000000"
raw = bytes.fromhex(elf)

obj = llvmlite.binding.ObjectFileRef.from_data(raw)
for s in obj.sections():
    if s.is_text():
        print(s.data().hex())
        # 48c1e2204809c24889d048c1c03d4831d048b9012000000070706572000000

p = lief.parse(raw)
print(bytes(p.get_section('.text').content).hex())
# 48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8

Maybe I am misunderstanding the way ObjectFileRef works but in that case you would be eager to know what I did wrong.

Looking forward forward for your help. Regards

Sep 16 '20 07:09 RobinDavid

I forgot to mention that I tried with version 0.34.0 of llvmlite

Sep 16 '20 07:09 RobinDavid

Indeed, objdump produces similar output to lief:

$ objdump -s test.elf 

test.elf:     file format elf64-x86-64

Contents of section .text:
 0000 48c1e220 4809c248 89d048c1 c03d4831  H.. H..H..H..=H1
 0010 d048b901 20000480 00107048 0fafc8    .H.. .....pH...

Sep 16 '20 09:09 gmarkall

I think the issue is that the contents are getting treated as a null-terminated string and therefore truncating the section - ffi.lib.LLVMPY_GetSectionContents(self) is returning a bytes array of size 21 in SectionIteratorRef.data().

Sep 16 '20 10:09 gmarkall

With:

diff --git a/llvmlite/binding/object_file.py b/llvmlite/binding/object_file.py
index 106f051..e5961b0 100644
--- a/llvmlite/binding/object_file.py
+++ b/llvmlite/binding/object_file.py
@@ -1,5 +1,6 @@
 from llvmlite.binding import ffi
-from ctypes import c_bool, c_char_p, c_size_t, string_at, c_uint64
+from ctypes import (c_bool, c_char_p, c_char, c_size_t, string_at, c_uint64,
+                    POINTER)
 
 
 class SectionIteratorRef(ffi.ObjectRef):
@@ -75,7 +76,7 @@ ffi.lib.LLVMPY_GetSectionAddress.argtypes = [ffi.LLVMSectionIteratorRef]
 ffi.lib.LLVMPY_GetSectionAddress.restype = c_uint64
 
 ffi.lib.LLVMPY_GetSectionContents.argtypes = [ffi.LLVMSectionIteratorRef]
-ffi.lib.LLVMPY_GetSectionContents.restype = c_char_p
+ffi.lib.LLVMPY_GetSectionContents.restype = POINTER(c_char)
 
 ffi.lib.LLVMPY_IsSectionText.argtypes = [ffi.LLVMSectionIteratorRef]
 ffi.lib.LLVMPY_IsSectionText.restype = c_bool

the reproducer produces:

$ python repro.py 
48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8
48c1e2204809c24889d048c1c03d4831d048b90120000480001070480fafc8

Will make a PR shortly.

Sep 16 '20 10:09 gmarkall

Fix proposed in #633 - many thanks for the report @RobinDavid!

Sep 16 '20 10:09 gmarkall

You're welcome ! Thank you for such reactivity @gmarkall !

Sep 16 '20 11:09 RobinDavid

llvmlite llvmlite copied to clipboard

ObjectFileRef seem's to return wrong section content

llvmlite
llvmlite copied to clipboard