bcm5719-fw icon indicating copy to clipboard operation
bcm5719-fw copied to clipboard

Add recovery instructions

Open hughsie opened this issue 3 years ago • 3 comments

I know from real feedback from end users that flashing hardware makes a lot of people nervous. This is going to be specially so for firmware explicitly not from the chip vendor.

In the README file there is a big warning followed by:

...an external programmer is required, or the external flash must be temporarily disabled during boot-up

...but it doesn't actually say which programmer would be required -- and because the SPI flash is a 2.7V part I'm guessing someone is going to connect up a typical 3.3V programmer and blow it to bits. Perhaps show the user a link of the programmer that you use making it explicit it needs to be 2.7V.

I think if the README was also expanded with, for example, a diagram showing the jumper to set for device recovery that would make a lot of people more likely to try this firmware. Thanks!

hughsie avatar Sep 24 '20 08:09 hughsie

During firmware development, I ran into a number of cases where I bricked the network card on the Talos II. This happens when incorrect settings are programmed into certain registers on the NIC, resulting in the card dropping off the PCI bus. When this happens, the only way to recover is to either (1) stop the card from booting off the NVRAM or (2) invalidate the firmware in NVRAM.

Unfortunately, on the Talos II and Blackbird, there are no jumpers to enable/disable the NVM. Additionally, there are no inline resistors and so an external device cannot easily overdrive the SPI signalling.

  • Short CSb to vcc or gnd. This causes the firmware to think NVM is invalid as it does not read valid firmware, and as a result will stop booting off of it. The pins generally need to be shorted until the final OS boots, as the linux driver causes device resets which may reload firmware. Once booted, the NVM config needs to be reset (there's code to do it in bcmflash) as the auto-detection failed during boot. This is sufficient to corrupt the flash, reset the system, and flash using the normal mechanism.

  • Writing to the EEPROM in-circuit. Generally this would require attaching wires to all SPI lines and overdriving as appropriate. I had problems with this approach (as mentioned above) even when the Talos II was un-powered. As this requires soldering to all SPI lines on board, I suggest for the first option (only requires soldering to CSb).

I'll take a look at the Dell NIC and see if there is a good way to recover that doesn't require soldering, but as it stands soldering is required for the blackbird products. To be clear: Any firmware images release have been tested to boot properly and should not have this issue. The only time I'd expect a possible bricking event is if a user is modifying and rebuilding stage1. That said, all other bad flashes are recoverable without an external programmer or soldering.

In any case, adding the above recovery procedure does make sense, if only to document it. I'll see what the best way to add

  • Pins to solder to to enable the SPI console (and possibly external flashing).
  • Pints to short and procedure to use to recover from a bricked device for the Talos II / Blackbird.
  • Possible devices to use to flash / access the SPI console. Note that there should be no issue with hoking up a 3.3v programmer so long as the board is unpowered. At that point you are then providing your own power supply. As for the signaling. 3.3v should be fine here without any real issue.

meklort avatar Sep 24 '20 13:09 meklort

Additionally, there are no inline resistors and so an external device cannot easily overdrive the SPI signalling

Hmm, that's unfortunate.

but as it stands soldering is required

I used to do PCB rework under a 'scope for job :)

adding the above recovery procedure does make sense

Many thanks, and sorry for adding to the every-growing list of things I ask from you.

hughsie avatar Sep 24 '20 13:09 hughsie

I used to do PCB rework under a 'scope for job :)

In this case, a scope is definitely a requirement.

Many thanks, and sorry for adding to the every-growing list of things I ask from you.

The comments a good - they help me see what issues users are running into so that the documentation / quality can improve. Plus, anything I fix here means you have more time to help me with fwupd.

meklort avatar Sep 24 '20 15:09 meklort