intelhex Starting Address record line is always the first line, but in original file it was last line

I am testing this module for some production/testing firmware automation and I am super pleased! I am noticing that when I encode a hex file to a dict with this module, then fetch and re-save to a hex file (the data is stored in a database as a JSON record) one line gets swapped in the file thus causing my md5 checksums to not match. While I am fairly certain the structure of an intel hex file prevents this from being an issue, it would be nice to under stand how this happens.

Encode to database:

file_name = file_path.split('/')[-1]
with open(file_path, 'rb') as file:
    md5 = hashlib.md5(
        file.read()
    ).hexdigest()
intel_hex = IntelHex()
intel_hex.loadhex(file_path)
if image_type.lower() in ['mfg', 'manufacturing']:
    self.manufacturing_fw = intel_hex.todict()
    self.manufacturing_fw_name = file_name
    self.manufacturing_md5 = md5

Decode From database:

if image_type.lower() in ['mfg', 'manufacturing']:
    output_file_name = f'{out_path}/{self.part_number}-{self.manufacturing_fw_name}'
    firmware_data = self.manufacturing_fw

data = {}
for key, value in firmware_data.items():
    if key != 'start_addr':
        key = int(key)
    data[key] = value

intel_hex = IntelHex()
intel_hex.fromdict(data)
intel_hex.write_hex_file(output_file_name)

I need to add these lines upon export to re-arrange the lines to get the exported file to match the original exactly. Not efficient or the minimum solution, but it works with everything I have tested so far.

file = open(output_file_name, 'r')
lines = file.readlines()
file.close()
output = open(output_file_name, 'w')
total_lines = len(lines)
last_line = lines[-1]
lines.append(lines[-1])
lines[0], lines[total_lines-1] = lines[total_lines-1], lines[0]
lines.append(last_line)
for line in lines[1:-1]:
    output.write(line)
output.close()

I appreciate any help with this!

Jul 08 '20 15:07 LukeGary462

Can You provide an example of a file before encode and after decode?

If i understood correctly (from the "fix" code) to have your image written correctly You must put first line at one before last position. I'm guessing that this is a Starting Address record. IntelHex standard doesn't say strictly about position. It can be placed at the beginning, it can be placed at the end (before End of File record). It could be placed between data segments as well. But this is just my theory - honestly I never seen such Hex file. Nevertheless this IntelHex implementation always place Starting Address record at the beginning of the file.

By the way, Your "fix" code can be simplified:

file = open(output_file_name, 'r')
lines = file.readlines()
file.close()
output = open(output_file_name, 'w')
lines.insert(-1, lines[0])
for line in lines[1:]:
    output.write(line)
output.close()

Of course if I clearly read out your intentions. But same is what You get when You run your code.

Jul 08 '20 22:07 fernandez85

I looked at the IntelHex standard again, and I realized that Starting Address record types are numbered higher then Data and Extended Adress records. So maybe this implementation isn't that good after all. But first let us check the root-cause of your problem.

Jul 08 '20 22:07 fernandez85

IntelHex startard allows separate records to be freely swapped until they all within one 64K page IIRC. Additional starting records can appear anywhere in the file. Anyway, in your approach I would not rely on MD5 hashing of binary file. With IntelHex files this is not stable approach. You can use hexdiff.py utility to actually compare content of 2 files, if you wish. Or maybe you can provide some minimal example of your files with different MD5 before and after processing. It's possible that starting address record is to blame here. I don't really need your binary files, you can inspect with text editor first and last lines and show first 8 bytes here. BTW the very last line must be EOF record always. So puting it to the start - sounds very wrong. Without real data it's hard to tell what's wrong actually.

Jul 09 '20 06:07 bialix

Thanks for looking in to this. Our images are over that size boundary (about 100K), unfortunately I cannot share the hex files here, but I have used hex files with swapped lines in the past when merging in bootloaders in the image. The reasoning behind using the MD5 is because my firmware guy wants that to be used (not my choice). When I diff the files I can see exactly the line that gets swapped. I will get together a small snippet of the beginning and the end of the files to show you what is happening.

Jul 09 '20 15:07 LukeGary462

Source File on the Right. Reconstructed on the Left

Beginning of file Meld_2020-07-09_13-09-33

End of file Meld_2020-07-09_13-10-19

Jul 09 '20 17:07 LukeGary462

Yes, that is the Start Linear Address record all right. As I already wrote. Current implementation always places Starting Address records at beginning. So no matter where it will be placed when importing a file (e.g. using 'loadhex') You will have always Starting Address record at the beginning of the file when exporting to a file (e.g. using 'write_hex_file').

Jul 09 '20 23:07 fernandez85

"the data is stored in a database as a JSON record" - why don't you save actual hex file content to database then?

Jul 10 '20 12:07 bialix

I will change the title of this issue. It can be fixed with another option of write_hex_file, oh my. That's already too complex.

Jul 10 '20 12:07 bialix

"the data is stored in a database as a JSON record" - why don't you save actual hex file content to database then?

I'm starting to think I should just do that... This module is very nice though. There is talk of changing bootloaders, so being able to merge the files with this module without manually creating new releases seems like nice future proofing

Jul 10 '20 22:07 LukeGary462

Yes, that is the Start Linear Address record all right. As I already wrote. Current implementation always places Starting Address records at beginning. So no matter where it will be placed when importing a file (e.g. using 'loadhex') You will have always Starting Address record at the beginning of the file when exporting to a file (e.g. using 'write_hex_file').

Im not sure why this is happening, this was compiled and linked in IAR EWARM

Jul 10 '20 22:07 LukeGary462

Why You have in Your Hex file Start Record on the end? Probably by a software design. @bialix already wrote that there isn't only one place to put that record. IntelHex documentation doesn't describe this precisely.

It can be fixed with another option of write_hex_file, oh my. That's already too complex.

Yeah. This function is really evolving very dynamically ;) I thinking about the solution. I see 2 ways:

We only care about importing and when loading we only check were it's placed, on beginning or at the end. If somewhere else then I suggest to just force 'on beginning' option in this case. And when writing we set it as-is (resolved while loading)
Option 1. + new parameter, that would replace 'write_start_addr' (in 'write_hex_file' function) with for example 'start_addr_mode', which would have 3 options:
- 'sof' - start of file, old behavior when 'write_start_addr'=True
- 'eof' - end of file, naturally before EOF record
- 'none' - old behavior when 'write_start_addr'=False

Jul 10 '20 23:07 fernandez85