boardioc: Reset & poweroff are protected by critical section.
Summary
I noticed that boardctl(BOARDIOC_RESET, 0) calls reboot_notifier_call_chain() before the reset.
The operations of reboot_notifier_call_chain() are happening within a critical section.
The reboot notifier will typically cause a sync() to be executed for all open files (at least on my system).
But, after reboot_notifier_call_chain() returns, there is nothing ensuring that the reset will be executed immediately.
At this point, this task may be preempted, and new writes to the files may be performed. A new sync() would be needed but it will not happen.
Wrapping everything in a critical section ensures that when the notifier completes, the system will immediately reset, without leaving room for other tasks to spoil the ready-to-reset state of the system.
Impact
The reset procedure is executed correctly.
Testing
Tested on a custom target based on the STM32F427.
Reboot is being executed correctly, without any issues or side-effects.
But, this change doesn't work for SMP, and the thread may suspend and switch to new thread even you hold critical section since the thread may block self or wake high priority thread inside sync function, the context switch happens immediately in both cases.
But, this change doesn't work for SMP, and the thread may suspend and switch to new thread even you hold critical section since the thread may block self or wake high priority thread inside sync function, the context switch happens immediately in both cases.
So, what is the proper way to handle this?
Although, to be honest, reboot_notifier_call_chain() is quite terrible.
I have seen problems within the notifier that prevent the system from rebooting (haven't found the root cause yet),
And generally, with this notifier, there is no way to reboot the system immediately. Like, now, without executing anything, without delay, without waking up any threads.
Also, it seems like it breaks the system architecture. This is a call to the "board", why is it messing with the FS?
If someone wants a "clean" reboot, why not call sync() outside of the board reboot? And then just do the reboot? And of course handle the concurrency problems, externally, as the application dictates.
Or maybe have a board interface, that just reboots the system, and an OS interface that performs a "graceful" shut-down?
But, this change doesn't work for SMP, and the thread may suspend and switch to new thread even you hold critical section since the thread may block self or wake high priority thread inside sync function, the context switch happens immediately in both cases.
So, what is the proper way to handle this?
Actually, if the safe reboot is the goal, you can't reboot immediately anyway, if there is some cache in the file system need to be flushed out.
Although, to be honest,
reboot_notifier_call_chain()is quite terrible. I have seen problems within the notifier that prevent the system from rebooting (haven't found the root cause yet),
reboot_notifier_call_chain self doesn't block anything, it's the callback block you.
And generally, with this notifier, there is no way to reboot the system immediately. Like, now, without executing anything, without delay, without waking up any threads.
Also, it seems like it breaks the system architecture. This is a call to the "board", why is it messing with the FS?
I don't see there is any design problem here. board code doesn't call fs directly, which just notify the reboot and the callback registered by fs do the sync. Do you have better approach?
If someone wants a "clean" reboot, why not call
sync()outside of the board reboot? And then just do the reboot? And of course handle the concurrency problems, externally, as the application dictates.
If sync hang in board_reboot, it likely will hang if you call sync from userspace directly. So, the right fix is finding the root cause why sync hang: is it blocking in file system code or block/mtd driver? and fix the blocking.
Or maybe have a board interface, that just reboots the system, and an OS interface that performs a "graceful" shut-down?
Yes, it could be but I think the default reboot should be safe. Nobody happy to lose his important data randomly after reboot o power off.
This needs some discussion.
I need some time, and I will get back to the mailing list for this.
@fjpanag any updates on this PR?