elf_diff
elf_diff copied to clipboard
Utf8 decode error
I am comparing two arm gcc elf files. If I do not specify the bin_dir the report is generated successfully but I get the "Unable to read assembly from binary" warning.
If I specify the correct bin_dir + bin_prefix the warning disappears and instead I get the following output (with utf-8 decode error):
py -m elf_diff --bin_dir tools\arm-gcc\bin --bin_prefix "arm-none-eabi-" --html_dir report2 [OLD].elf [NEW].elf
Tools:
objdump: tools\arm-gcc\bin\arm-none-eabi-objdump.exe
nm: tools\arm-gcc\bin\arm-none-eabi-nm.exe
readelf: tools\arm-gcc\bin\arm-none-eabi-readelf.exe
size: tools\arm-gcc\bin\arm-none-eabi-size.exe
Verifying config keys...
Symbol selection regex:
old binary: 'None'
new binary: 'None'
Symbol exclusion regex:
old binary: 'None'
new binary: 'None'
Parsing symbols of old binary ([OLD].elf)
File format of binary [OLD].elf: elf32-littlearm
Extracting symbols
100% (5577 of 5577) |#####################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
Gathering instructions
100% (223307 of 223307) |#################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
Parsing symbols of new binary ([NEW].elf)
File format of binary [NEW].elf: elf32-littlearm
Extracting symbols
100% (5564 of 5564) |#####################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
Gathering instructions
================================================================================
Traceback (most recent call last):
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\__main__.py", line 124, in main
exportDocument(settings)
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\__main__.py", line 66, in exportDocument
document: ValueTreeNode = generateDocument(settings)
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\pair_report_document.py", line 1167, in generateDocument
meta_document.configureValueTree(value_tree, settings=settings)
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\pair_report_document.py", line 976, in configureValueTree
self.binary_pair = BinaryPair(
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary_pair.py", line 103, in __init__
self.new_binary = Binary(
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 78, in __init__
self._initSymbols()
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 122, in _initSymbols
self._gatherSymbolInstructions()
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 108, in _gatherSymbolInstructions
instruction_collector.gatherSymbolInstructions(
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\instruction_collector.py", line 136, in gatherSymbolInstructions
objdump_output: str = runSystemCommand(
File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\system_command.py", line 33, in runSystemCommand
output: str = o.decode("utf8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 12242097: invalid start byte
================================================================================
elf_diff is unconsolable :-( Something went wrong
================================================================================
Error: 'utf-8' codec can't decode byte 0xfc in position 12242097: invalid start byte
================================================================================
Don't let this take you down! Have a nice hot coffee and start over.
================================================================================
Is there any way I can debug the source of the error / find out what is causing the wrong utf-8 string?
Sorry for this answer coming pretty late. I am currently too busy to work on this project.
You might want to try replacing the decode call in line 33 of system_command.py with output: str = o.decode("utf8", errors="ignore")
. I am not sure, though, which character causes the decoding to fail.
Here's my two cents on this issue:
I replaced that call with
try:
output: str = o.decode("utf8")
except:
with open("subprocess_output.txt", "wb") as f:
f.write(o)
raise
and got a text file containing the problematic output. In my case it was a section sign (0xA7) in the line containing source code. It appears my sources are encoded not as UTF-8 but as CP1252. After replacing the codec in decode
call, elf_diff
ran smoothly.
It would be nice to add source file encoding option to elf_diff
command. And it may be different for the first and second ELF file.