pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

Parallelism and pickle errors

Open zorbanaut opened this issue 4 years ago • 8 comments

I tried to speed up parsing by using ProcessPoolExecutor:

import concurrent.futures
from elftools.elf.elffile import ELFFile

elf_name = "examples/sample_exe64.elf"

def parse_DIEs_in_CU(CU_offset):
    with open(elf_name, 'rb') as f:
        elffile = ELFFile(f)
        dwarfinfo = elffile.get_dwarf_info()
        CU = dwarfinfo._parse_CU_at_offset(CU_offset)
        return [DIE for DIE in CU.iter_DIEs()]

def main():
    with open(elf_name, 'rb') as f:
        elffile = ELFFile(f)
        dwarfinfo = elffile.get_dwarf_info()
        CU_offsets = [CU.cu_offset for CU in dwarfinfo.iter_CUs()]
        with concurrent.futures.ProcessPoolExecutor() as executor:
            DIEs = list(executor.map(parse_DIEs_in_CU, CU_offsets))
            print(DIEs)

if __name__ == '__main__':
    main()

but got errors like

AttributeError: Can't pickle local object 'DWARFStructs._create_initial_length.<locals>._InitialLength'

Is it feasible to make pyelftools pickle-friendly?

zorbanaut avatar May 15 '20 16:05 zorbanaut

Feel free to submit PRs.

eliben avatar May 15 '20 17:05 eliben

Also be aware that even though the dwarf info is read into python memory buffers, the parse_structs api is built around reading file descriptors for each section so the parallelizaton above will likely fail.

Iterating though the CUs should build up the cache for the DIEs with the current code.

mdmillerii avatar May 18 '20 02:05 mdmillerii

Are you thinking about references to DIEs from a different CU? The code above worked for me when using ThreadPoolExecutor(max_workers=n). Each worker opens its own file stream (it can be optimized of course in different ways). Cross-CU DIE references are resolved later, in sequential code.

zorbanaut avatar May 21 '20 12:05 zorbanaut

@zorbanaut @eliben Hello, I also tried to use multi-processing to speed up parsing CU, however I got same situation as below error. did you find out why it happens and also how to avoid this situation ?

AttributeError: Can't pickle local object 'DWARFStructs._create_initial_length.._InitialLength'

goododk avatar Jul 26 '22 13:07 goododk

It was not a supported scenario back then, it's still not a supported scenario.

What are you trying to do and where is the slowdown exactly?

sevaa avatar Jul 26 '22 21:07 sevaa

@sevaa For the purpose, I tried to get all DIEs in which tag is 'DW_TAG_variable' from dwarf as below. This double For loop takes more than 30min, it is quite slow. That's why I tried to speed up with multi-processing. (the file size of ELF I tested is more than 200MB)

Actually I'm not quite familiar with ELF parsing, is there any easier way to get all DIEs in which tag is 'DW_TAG_variable' ??

image

goododk avatar Jul 27 '22 13:07 goododk

Usually, easier is the opposite of faster :)

Does your DWARF contain DW_AT_sibling attributes?

OBTW, is it specifically the loop that takes 30 minutes, or the initial DWARF retrieval (the get_dwarf_info() call)? The former only navigates the in-memory structures, while the latter loads from the file.

sevaa avatar Jul 27 '22 13:07 sevaa

I've also noticed that decoding the string table could be quite slow. As I remember (years ago) I could parse through top level objects in a cu faster than building up the name index.

The code ends up treating buffers as files for the struct parsing which while it generalized decompression it's not taking advantage of Read only access to the data.

mdmillerii avatar Jul 27 '22 18:07 mdmillerii