RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

OTA: tracking issue

Open bergzand opened this issue 6 years ago • 53 comments

Tracking OTA progress

Tracking issue for the different parts required to enable over the air updates.

Overview

We are aiming at a simple OTA prototype in a first phase, which can be extended in later phases. The blueprint for the first prototype is the implementation we had working at the SUIT Hackathon in June 2018 (see slide 3 of this recap). This combines a simplistic bootloader, 2 firmware slots and metadata that follows the SUIT specification. A basic update module, embarked in each image, manages firmware transfer (via CoAP), verification (complying with SUIT) and storage in Flash within the non-running image slot.

Work breakdown

ota work breakdown_v3

Current state of development

@cladmi is addressing the tooling for the bootloader and metadata @kYc0o is addressing the bootloader @OYTIS is working on support for RIOT, including OTA, for KEA128LEDLIGHTRD board (https://github.com/RIOT-OS/RIOT/pull/9655)

See the GitHub Software Updates project for more details.

System diagrams

Both of the diagrams reflect the content of the working OTA update branch. This contains working examples for both a metadata-based update, and a SUIT manifest-based update. The areas not relevant to the example addressed in each section are greyed out.

Non-SUIT metadata based update mechanism

A description of the metadata is given here.

ota component diagram - metadata based updating_v1

SUIT metadata based update mechanism

ota component diagram - SUIT-based updating_v1

The process being followed is to merge the functionality required for OTA updates piecewise, instead of as a large monolithic PR.

Files for diagrams above

The diagrams were done using draw.io (https://www.draw.io/). Contact @danpetry for the project files.

bergzand avatar Jun 13 '18 15:06 bergzand

Linker offset supports depends on ~https://github.com/RIOT-OS/RIOT/pull/9451~ to allow testing.

cladmi avatar Jun 28 '18 13:06 cladmi

State of development

Please update this list as required so that we all know what we're up to

Development branches and repos

Here's a list of active branches and repos where people are developing. Please add your own with a short description if you can get around to it. https://github.com/kYc0o/RIOT/tree/hackathon_suit_2018 - Paco dev branch https://github.com/kYc0o/riot-ota-server/tree/master/otaserver - OTA server (Paco fork) https://github.com/aabadie/ota-server - OTA server (original) https://github.com/danpetry/RIOT/tree/ota/sandbox - Dan dev branch. Works with samr21-xpro along with a gen_and_flash.sh script which builds/flashes/generates manifest for use with a CoAP client/server for firmware distribution https://github.com/suit-wg/SUIT-repository-manager - CoAP client/server for firmware distribution https://github.com/suit-wg - suit working group. https://github.com/hannestschofenig/suit-manifest-generator - Manifest generator. fork of ARMmbed/suit-manifest-generator with a couple of additions.

Related issues not in the diagram above

Either because they are dependencies to the issues above, or some other reason. Basically here are some issues you might want to be aware of, and help along if you can. https://github.com/RIOT-OS/RIOT/pull/8902 - Original OTA PR ~https://github.com/RIOT-OS/RIOT/pull/9351~ - ~PR to allow partial ROM flashing for cortex m cpus~ EDIT (cladmi): to allow generating firmwares in part of the ROM for cortex-m

Current work being undertaken

Dan - working on RAPstore project on integration, so architecture can be informed by situational concerns Anton (OYTIS) - implementing periph/flasher module for a board (? which one)

danpetry avatar Aug 07 '18 13:08 danpetry

For binflash I would also have a dependency of FLASHFILE: https://github.com/RIOT-OS/RIOT/pull/8838

And I am thinking, why do we even need to do binflash. Flashing a binary means having both a file and an address. And this could be done by generating an ihex file instead.

cladmi avatar Aug 13 '18 09:08 cladmi

And I am thinking, why do we even need to do binflash. Flashing a binary means having both a file and an address. And this could be done by generating an ihex file instead.

As far as my early tests went, it was very complicated to manage ihex files especially when you want to flash bootloader + image (slot 1 or 2). Then build an hex file with such information was very challenging at that time. To speed up things, and make something more reliable I used bin files flashed at the needed address.

Now we have all the information we need to know where to flash, so I'd say there's no problem using and flashing binary files while a bootloader takes the beginning of the flash.

kYc0o avatar Aug 13 '18 13:08 kYc0o

For the bootloader, the mimimal set of required metadata is this:

/**                                                                              
 * @brief Structure to store firmware metadata                                   
 * @{                                                                            
 */                                                                              
typedef struct {                                                                 
    uint32_t magic_number;              /**< metadata magic_number (always "RIOT")  */
    uint32_t version;                   /**< Integer representing firmware version  */
    uint32_t start_addr;                /**< Start address in flash                 */
    uint32_t chksum;                    /**< checksum of metadata                   */
} firmware_metadata_t;                                                           
/** @} */                                                                        
  • magic_number: To indicate that a firmware metadata field is located here.
  • version: Version of this firmware. The bootloader will boot the firmware with the highest version number
  • start_addr: Start address of the firmware. This way the bootloader can easily work with variable metadata sizes.
  • chksum: Checksum of the metadata to check for a valid metadata (not firmware).

bergzand avatar Aug 15 '18 14:08 bergzand

Documentation todo

  • Dependencies:
    • CPU peripherals required
    • metadata format and bootloader interaction
  • Metadata fields
  • CPU initialization (happens twice)

bergzand avatar Aug 15 '18 14:08 bergzand

Notes about flashbin:

edbg

edbg is already flashing binary files. An offset from the start address can be given using -o|--offset the PR currently uses this as an absolute address which is wrong in general. So flashing a binary from start address works.

Some work is only required to disable erasing everything and supporting flashing from an offset of the start address. Which is only needed when flashing sub parts of the rom/invalidating a slot.

edbg.inc.mk: allow flashing with an offset in rom without erasing all ROM #9788

Openocd

Openocd supports flashing from binary as is, except that the flashing address must be given. It uses offset which is an offset from what is in the firmware, so offset from the link address for elf files/hex files, offset from 0 for binary files.

http://openocd.org/doc/html/Flash-Commands.html flash write_image.

I plan to use the ROM_START_ADDR in the openocd script to make flashing a binary work by default without external configuration. And use OFFSET as an offset from this.

EDIT: The address cannot easily extracted from the build system, openocd.inc.mk is included before cpu/.../Makefile.include and having export ROM_START_ADDR would give it a default empty value breaking the ROM_START_ADDR ?= lines.

I queried this address from openocd.

openocd.sh: allow flashing binary files without configuration #9787

This would give a common behavior between boards using edbg and openocd.

cladmi avatar Aug 16 '18 10:08 cladmi

Minutes from this week's OTA meeting (15.08.2018) available here

emmanuelsearch avatar Aug 17 '18 19:08 emmanuelsearch

Which option is preferred to add comments/questions on the minutes? Should I comment via this issue, create a new issue, or ...?

waehlisch avatar Aug 18 '18 14:08 waehlisch

I'd say: comment here?

emmanuelsearch avatar Aug 18 '18 14:08 emmanuelsearch

I agree that it makes sense to use the existing Block1 functionality for now. Conceptually, some host sends firmware to the (RIOT) device, although I guess use of PUT/POST or GET depends on the higher-level application protocol. At any rate, the device is on the receiving end.

Also, "roadblock" is not a word I like to see associated with my name! I'm reviewing the RFC and Block2 PR (#8932), and plan to comment in the next day or so on moving forward.

kb2ma avatar Aug 20 '18 11:08 kb2ma

@kb2ma actually Block2 has advantages in terms of security characteristics (since it is pull-based). If #8932 could be merged fast into master (before other CoAP reworks), it would improve the initial OTA solution we are going for, no doubt. That said, we plan OTA enhancements/reworks anyways, after an initial (basic) OTA mechanism is merged in the master branch. Hence, thanks a lot for your proposal to let us know about #8932, indeed ;-)

emmanuelsearch avatar Aug 20 '18 13:08 emmanuelsearch

Both edbg and openocd binflash have been merged. The branch will need to be rebased with the changes. @kYc0o we can spend some time doing this and testing this afternoon if you want.

It currently still requires setting two different variables to handle both openocd and edbg to flash a binary file but will be addressed by https://github.com/RIOT-OS/RIOT/pull/8838.

cladmi avatar Sep 05 '18 09:09 cladmi

Another dependency I remember as I am rebasing the branch, is process dependencies for CPU and move stm32_common periph_flashpage dependencies to Makefile.dep I could easily provide a test for this one even if the test does not need to be merged.

cladmi avatar Sep 05 '18 14:09 cladmi

Finally I got to test the current state with a rebased branch with most recent changes in master related to OTA. The most updated branch is at [1].

Two problems:

  • Compile for slot 2 doesn't seem to work with the current verification for slot sizes: Specified firmware size does not fit in ROM. However, by bypassing the assert in cortexm.ld it compiles and actually is linked correctly, but:
  • For edbg this line is not working EDBG_ARGS += $(addprefix --offset ,$(IMAGE_OFFSET)):
/Users/facosta/git/RIOT-OS/RIOT/dist/tools/edbg/edbg --offset $((0x1000 --offset + --offset 0x1f800)) -t atmel_cm0p -b -v -p -f /Users/facosta/git/RIOT-OS/RIOT/examples/suit_updater/bin/samr21-xpro/suit_updater-slot2.signed.bin
bash: 0x1000 --offset + --offset 0x1f800: syntax error in expression (error token is "--offset + --offset 0x1f800")
make: *** [/Users/facosta/git/RIOT-OS/RIOT/makefiles/riotboot.mk:112: riotboot/flash-slot2] Error 1

[1] https://github.com/kYc0o/RIOT/tree/wip/rebase/ota_work_branch

kYc0o avatar Sep 14 '18 15:09 kYc0o

For edbg I did not think about the value with spaces so 'addprefix' does not work in that case. Two solutions:

  • Replace IMAGE_OFFSET=$((0x1000+0x1f800)) without spaces
  • Replace the EDBG_ARGS line with $(if $(IMAGE_OFFSET),--offset $(IMAGE_OFFSET)))

The first one can be done directly on our side to make it work, and upstream the proper fix to RIOT after.

cladmi avatar Sep 17 '18 08:09 cladmi

I found the issue for the offset on slot 2, the assert was right. In makefiles/riotboot.mk I wrote FW_ROM_LEN=$(RIOTBOOT_FW_SLOT_SIZE) when the slot size includes both the metadata and the firmware, so should be FW_ROM_LEN=$(RIOTBOOT_FW_SLOT_SIZE - $(RIOTBOOT_HDR_LEN))

And then the ldscript changes can be reverted.

cladmi avatar Sep 17 '18 09:09 cladmi

We are not consistent with the names yet. If slot means metadata + firmware, the generated files should be adapted.

cladmi avatar Sep 17 '18 09:09 cladmi

I also have some issues when building the bootloader on my laptop as git am, which applies patches to packages, tries to use HOME to find git configuration and extract email and author name but we do not pass it down for building.

For me it is an issue that 'git am` tries to use local configuration to do the commit. I want to fix this for some time but did not have real visible problems to show.

cladmi avatar Sep 17 '18 10:09 cladmi

Just as a side note I made while playing around with @kYc0o branch, if the periph_gpio, periph_uart and uart_stdio are disabled (I had to comment out some initialization functions), the bootloader takes 1428B on the samr21-xpro. Original was 2920B. I didn't test whether it was still functioning, but I don't see a reason why the bootloader would need those peripherals. Chopping the flash requirement in half seems to be the more beneficial trade off here.

On a practical side, this requires making the initialization of the gpio and (stdio_)uart optional. While not impossible, this would add another todo on the already large list. I just want this to be noted down somewhere so I don't forget about it.

bergzand avatar Sep 17 '18 18:09 bergzand

Would people possibly want the stdio uart for debugging?

this would add another todo on the already large list.

I vote for getting the bootloader merged without this, and then optimizing more for size later. For this reason.

danpetry avatar Sep 18 '18 09:09 danpetry

Would people possibly want the stdio uart for debugging?

Probably, but if the bootloader is functioning, it should not be necessary. Furthermore, if your device is somewhere in the wall, the UART might not be accessible and kinda useless to include in the config.

I vote for getting the bootloader merged without this, and then optimizing more for size later. For this reason.

That's exactly what I would also prefer here, just want to have it somewhere written down :smile:

bergzand avatar Sep 18 '18 09:09 bergzand

Additional note, with the current implementation using two slots, saving space in the bootloader is only useful if you save two pages, as both slots have the same size.

For the samr21-xpro with a page size of 256 you save ROM, with the iotlab-m3 the page size is 2048 so there would be nothing gained here, with the current implementation.

cladmi avatar Sep 18 '18 10:09 cladmi

Proper fix for edbg issue with spaces in IMAGE_OFFSET: https://github.com/RIOT-OS/RIOT/pull/9956

cladmi avatar Sep 18 '18 16:09 cladmi

Proper fix for edbg issue with spaces in IMAGE_OFFSET: #9956

Thanks @cladmi ! It is merged. Do you mind to rebase again? It should be easier now.

kYc0o avatar Sep 19 '18 12:09 kYc0o

@kYc0o I will do it and remove non needed commits, I will still keep the last fixups for the moment.

cladmi avatar Sep 19 '18 13:09 cladmi

Rebased, I checked the rebased diff against the previous one.

Now I can do make riotboot/flash-bootloader, make riotboot/flash-slot1 and make riotboot/flash-slot2 for samr21-xpro without modifications.

cladmi avatar Sep 19 '18 13:09 cladmi

I also included changes to make it work on iotlab-m3 using openocd.sh. IMAGE_OFFSET and IMAGE_FILE should be exported to be visible in the script.

cladmi avatar Sep 19 '18 13:09 cladmi

And fixed for iotlab-m3 flash-slot2 there are issues with $(( )) and export.

I also think that flash-slot1 and flash-slot2 should NOT flash the bootloader.

cladmi avatar Sep 19 '18 14:09 cladmi

Cpu_init idempotency test

We wanted to implement a test which re-ran cpu_init(), as a unit test which would have helped to isolate bugs for people wanting to port the bootloader to a new board, and would have avoided them attempting this if the test failed. This was intended as a necessary but not sufficient test for the bootloader to be able to load working firmware.

It was found that for the SAM R21, this gave a false negative as such a test. This is because the newlib initialization is run after cpu_init(), and here the state is set correctly of the generic clock register that clocks the stdio USART. This means that no [SUCCESS] assert printout is received by the testing machine if only cpu_init() (which sets these registers into a default, incorrect state) is run.

In general, the startup sequences are, currently, quite heterogeneous. This means that the startup sequence can't really be broken down into testable units. Although such tests are important to help people to isolate bugs more quickly - we have already had issues in this area - here it would require refactoring of the startup code for all different boards to a common format, which would be a considerable amount of work. This work is also considered non-blocking to the implementation of a bootloader.

I've raised an issue (https://github.com/RIOT-OS/RIOT/issues/9964) for this work, which can continue in parallel with the initial implementation of a bootloader.

danpetry avatar Sep 19 '18 14:09 danpetry