edgetx reduce risk of corrupt radio.yml #1412

Fixes #1412

Summary of changes: Adds CRC16 checksum to radio.yml when saved. Adds save to temporary file, folowed by rename to radio.yml Adds check on load, with fall-back to temporary file

Jan 22 '22 11:01 stephenaa

I guess the test will also fail on the coming updated PR : https://github.com/EdgeTX/edgetx/issues/1412

Should I modify the radio.config used for the tests? Or is that provided by the test action setup? In any case, if neither checksum nor "manuallyEdited" is present in the file, loading will result in an alert - that causes the test to fail. That needs to be addressed somehow :-)

Jan 22 '22 15:01 stephenaa

Changed to use an enum class for the checksum return values.

Jan 24 '22 19:01 stephenaa

Just out of curiosity.

How does it prevent file corruption? AS far as i unterstand it should error out only on missing Checksum, right? If checksum is present but changed, how do we know that the user didn´t change anything? And how do we know that the content of the yml and therefore the new checksum is correct before we overwrite the old probably correct file?.

i´m probably missing something, really just curious

Jan 25 '22 08:01 Nerenye

It tries to detect if radio.yml was not fully written to flash, and then reverts to the alternative file. When radio.yml is written it is first saved under the alternative name, then renamed. So the checksum allows to detect corruption, and gives the SW one shot in reverting to a valid file.

For the user editing manually, there is a YAML variable (manuellyEdited) that makes the SW skip checksum check at load.

Jan 25 '22 08:01 stephenaa

yeah that´s the part i get. but the checksum would also change if you change something in radio configuration e.g not manually editing the file. Wouldn´t the checksum check fail at that point? or how do we check if the file is corrupted this time. I mean if you change somethin in the radio config and a write is triggered, And unlucky as we are the write is corrupted. How does the checksum prevent us from overwriting the old file. SInce as far as i understand it can´t know that the new checksum is not correct ?

Jan 25 '22 08:01 Nerenye

It writes the new configuration to a different name and only after completely writing, the old file is replaced. So that, at all times, a good working config file should be there.

Jan 25 '22 08:01 gagarinlg

yeah but how do we know it´s a good working file.

If we change the radio config the checksum changes aswell if it writes the temp file and the write is corrupted it won´t realise the file is corrupted because its valid that the checksum missmatches because we changed some config entrys. And will probably overwrite the non corrupted file.

Jan 25 '22 08:01 Nerenye

The checksum is regenerated with every config change and written as part of the file. It is not static. A config file must either contain the manually edited flag or a correct checksum, that matches the file contents.

Jan 25 '22 08:01 gagarinlg

so the checksum is generated before we write and we would know if the write corrupted if we check the file content checksum before we rename.

ok get it. thanks for the clarification 👍

Jan 25 '22 08:01 Nerenye

We don't do a finishing read-back before rename at this point, but that could easily be added. However, do we want to make save take double the time? On all radios?

Jan 25 '22 08:01 stephenaa

I do not think, that we need this. If someone wnats to have additional safety here, I would prefer to reneame the old file and then the new file after each write, so that we have the old one as a backup. But the cases where the config is corrupted should be really rare, except when debugging and constantly resetting the radio. In my opinion we are good.

Jan 25 '22 08:01 gagarinlg

I will test this PR at the weekend

Jan 28 '22 06:01 gagarinlg

-rw-r--r-- 1 xxx xxx 2900 Jan 28 2023 radio_error.yml -rw-r--r-- 1 xxx xxx 2933 Jan 29 2022 radio_new.yml -rw-r--r-- 1 xxx xxx 2329 Jan 29 2022 radio.yml I get this after calibration. The old config is still there and the new one is written, but not renamed. Do I have to do something speacial? Ignore the strange dates, please

Jan 29 '22 13:01 gagarinlg

Did you test this on a reald radio? Maybe renaming when there already is a file does not work.

Jan 29 '22 13:01 gagarinlg

Nice find - thanks.

I am currently down with Corona - hopefully I am back in cation within a few days :-/

Jan 29 '22 14:01 stephenaa

I am currently down with Corona - hopefully I am back in cation within a few days :-/

I hope you wil recover quick and without any permanent problems. There is no time preasure, so dion't worry about this.

Jan 29 '22 14:01 gagarinlg

I am currently down with Corona - hopefully I am back in cation within a few days :-/

Take care mate!

Jan 30 '22 00:01 pfeerick

I will take a look and test it within the next days. Thank you very much for your work.

Feb 01 '22 21:02 gagarinlg

This works much better now. One thing I found is, that an old config file will trigger an unrecoverable error and reset the configuration. It probably make sense to check, if there is a checksum at the beginning, and if not, but the file has a reasonable size, load it anyways. Or we need a new storage file version to detect this. @pfeerick, @raphaelcoeffic what do you think?

also the companion needs to be checksum aware

@stephenaa I rebased the PR

Feb 05 '22 13:02 gagarinlg

Should I add checksum support to companion?

I see the issue with old config files, but I am afraid it is non-trivial to set up rules to detect if the config file has been corrupted and not just an old file.If we have different version (and do a conversion?) the checksum of course should just be added when that is performed. Anyhow, this is just a problem just after update - is there another way of detecting that it is the first load of radio settings after a SW upgrade?

Feb 16 '22 22:02 stephenaa

It would be great if you could add this to companion. In the first step I forgot, because I don't really use companion.

I think the storage version is the only way to detect if the file needs to be updated

Feb 17 '22 01:02 gagarinlg

It would be great if you could add this to companion. In the first step I forgot, because I don't really use companion.

I think the storage version is the only way to detect if the file needs to be updated

I'am on it :-) I was wondering, what should companion behaviour be if a radio.yml is opened, but checksum is wrong? Can we rely on companion to create a valid settings file from scratch? This overlaps a bit with #1521 IMHO.

For the other issue I have been thinking a bit more. Absence of a value "checksum" is only a problem if it is not on the first line, that is - if there is no 'checksum' value, and there is something sensible in the file (and the length is realistic as you suggested earlier) the file is probably (nearly 100%) an old file from before the checksum check was added.

Feb 17 '22 14:02 stephenaa

I found a way around the newline issue. In reality it is only needed when ignoring the checksum field added as the first line of the file. By rearranging the logic I managed to minimize the amount of code exposed to the variability.

Also changed the radio firmware to accept files with no checksum field if their size is >25 bytes. I can't really see a better way right now to qualify an "old file", that is a file from before the checksum field was added. Maybe it should be kept in mind for a future file version version bump?

Feb 19 '22 13:02 stephenaa

Also changed the radio firmware to accept files with no checksum field if their size is >25 bytes. I can't really see a better way right now to qualify an "old file", that is a file from before the checksum field was added. Maybe it should be kept in mind for a future file version version bump?

I think the limit could be a lot higher than 25 bytes. Other than that, the code looks good, I will test it tomorrow.

Feb 19 '22 19:02 gagarinlg

Keep in mind that the same code reads model YAMl, so the minimum size could have to be the minimum model file if we are to extend checksum to models.

Sendt fra min iPhone

Den 19. feb. 2022 kl. 20.08 skrev Malte Langermann @.***>:

Also changed the radio firmware to accept files with no checksum field if their size is >25 bytes. I can't really see a better way right now to qualify an "old file", that is a file from before the checksum field was added. Maybe it should be kept in mind for a future file version version bump?

I think the limit could be a lot higher than 25 bytes. Other than that, the code looks good, I will test it tomorrow.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

Feb 19 '22 19:02 stephenaa

I think the minimum model ist larger, but that as a question for @raphaelcoeffic . Reagrading checksums on models, I am not sure. Models are not written as often. On the other hand a warning, when a model might be corrupted doesnÄt sound too bad. But this would be a task for another PR.

Feb 19 '22 19:02 gagarinlg

I think the minimum model ist larger, but that as a question for @raphaelcoeffic . Reagrading checksums on models, I am not sure. Models are not written as often. On the other hand a warning, when a model might be corrupted doesnÄt sound too bad. But this would be a task for another PR.

Let's leave the model part for now. Since the function is not called with a pointer for the CRC in case of model files it is a non-issue for now.

Feb 19 '22 23:02 stephenaa

I found an issue. When there is no radio.yml, but a radio.bin and a radio_new.yml a new bin->yaml conversion is started

Feb 20 '22 16:02 gagarinlg

So that conversion is triggered before reading of radio.yml of course. The check should probarbly include “radio_new.yml” as Well?

Sendt fra min iPhone

Den 20. feb. 2022 kl. 17.06 skrev Malte Langermann @.***>:

I found an issue. When there is no radio.yml, but a radio.bin and a radio_new.yml a new bin->yaml conversion is started

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

Feb 20 '22 18:02 stephenaa

So that conversion is triggered before reading of radio.yml of course. The check should probarbly include “radio_new.yml” as Well?

Yes, that should fix that problem.

Feb 20 '22 18:02 gagarinlg