Make the `run` command wait for Hub to get ready
Introduction
When fine-tuning programs and running them via pybricksdev, I often run the program several times per minute.
However, what happens to me frequently is that
- the Hub is off (turns off itself after a certain inactivity period)
- the Hub is running the old program when I forget to stop it by pressing the Hub's button
In both cases, the execution fails and I have to either turn the Hub on or stop the program, and then re-run pybricksdev run command.
It would be much more user firiendly if pybricksdev would handle these scenarios:
- If the Hub is turned off while
pybricksdev runis waiting for connection, the program should be loaded and executed once the Hub is turned on - If the Hub runs a program while
pybricksdev runis executed,pybricksdevshould- either wait a moment till the user stops the old running program and then load and run the new one
- or even better, stop the old program and run the new one
Environment
macOS 12 Python 3.10.1 pybricksdev v1.0.0-alpha.24 Inventor Hub
Replication steps
- Switch off the Hub
- Execute
pybricksdev run ble program.py - Switch on the Hub
- Actual behaviour:
asyncio.exceptions.TimeoutError - Expected behaviour: The program should be uploaded and executed
Stack trace
Traceback (most recent call last):
File "/Users/pavel/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/pavel/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/pavel/Git/Other/pybricksdev/pybricksdev/__main__.py", line 9, in <module>
main()
File "/Users/pavel/Git/Other/pybricksdev/pybricksdev/cli/__init__.py", line 374, in main
asyncio.run(subparsers.choices[args.tool].tool.run(args))
File "/Users/pavel/.pyenv/versions/3.8.12/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/pavel/.pyenv/versions/3.8.12/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/Users/pavel/Git/Other/pybricksdev/pybricksdev/cli/__init__.py", line 198, in run
await hub.run(script_path, args.wait)
File "/Users/pavel/Git/Other/pybricksdev/pybricksdev/connections.py", line 367, in run
await self.send_block(length)
File "/Users/pavel/Git/Other/pybricksdev/pybricksdev/connections.py", line 340, in send_block
await asyncio.wait_for(self.checksum_ready.wait(), timeout=0.5)
File "/Users/pavel/.pyenv/versions/3.8.12/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
Analysis (cli/__init__.py and connections.py)
- The
device_or_address = await find_device(args.name)works with a default10stimeout - Increasing the
await asyncio.wait_for(self.checksum_ready.wait(), timeout=0.5)timeout does not help - The
nus_handlerisn't caled at all for some reason, once the Hub is turned on - If I do sleep before a connection is made, everything works.
diff --git a/pybricksdev/connections.py b/pybricksdev/connections.py
index c56d1f4..520e8b3 100644
--- a/pybricksdev/connections.py
+++ b/pybricksdev/connections.py
@@ -291,6 +291,7 @@ class PybricksHub:
logger.info("Disconnected!")
self.connected = False
+ await asyncio.sleep(5)
await self.client.connect(disconnected_callback=disconnected_handler)
try:
logger.info("Connected successfully!")
I've tried to fix the problem myself but failed. Would you guide me on how could I achieve the behaviour described in the Introduction section? Happy to code the stuff myself if that's possible.
This would be nice. :+1:
However... if this were to become a direction we're going in, perhaps it would be time to pick up the old idea of making a proper Visual Studio Code extension (or something like it) to download and run programs and multi-file projects, and stay connected in between runs.
pybricksdev was originally (and probably still is) intended as a basic tool for pybricks developers to simplify deployment and testing.
Visual Studio plugin would be nice but then we're stuck with that single IDE.
I pretty much like pybricksdev because I can easily integrate it with IntelliJ which is my favorite IDE.
That's why I volunteer to improve pybricksdev π
Giving the issue a second look, I have found the following:
- Run a program
- Turned the Hub on
- The Hub is found successfully when calling
device_or_address = await find_device(args.name) - The connection to the Hub is successful using
await hub.connect(device_or_address) - The length of the compiled program is sent to the Hub using
await send_block(length) - The device does not respond by sending back the checksum
When adding a 1s wait time before the program length is sent, everything works fine.
I suspect the problem is that the Hub initialisation takes some time and when we send data prior to the Hub being initialised, it simply does not receive it and does not reply with the checksum. Am I correct here?
I have also noticed the StatusFlags do not have any HUB_IS_READY flag which means it is impossible to delay the data sending until the HUB can actually receive any data π’
I added a similar delay in cli/flash.py because of the same problem. I don't think that the hub can actually known when it is ready though. I think the problem is that the Bluetooth stack on the computer is still probing the hub for a while and it causes the request to get dropped somewhere along the way. If the hub is able to connect via Bluetooth, then everything in the firmware that we wrote should be "ready" already (unless perhaps you are still holding down the power button).
I can see the await asyncio.sleep(1) now in the cli/flash.py. Thanks!
It's weird though that the same issue does not happen when the Hub is on before executing pybricksdev run.
Since the Hub is ready when the connection is successful, I don't understand what is happening to the data sent. I'd like to fix this stuff properly without using sleep, but I haven't the foggiest idea what is going on and how to find out the root cause π.
Capturing Bluetooth packets to see what is actually going on is usually a good first step.
Capturing Bluetooth packets to see what is actually going on is usually a good first step.
Will give it a try. The problem I see is to be able to understand what all the packets mean π
If you are using SPIKE/MINDSTORMS hub, then the issue could also be that BTStack is doing more in the background than the other Bluetooth drivers on other hubs and it needs a bit of delay before advertising on boot. Also, those hubs now have some blocking code for the external flash memory that could be interfering.
Tried to capture BLE packets for both scenarios - Hub turned on prior to running pybricksdev, and Hub turned on after running pybricksdev.
The results are the same for finding and connecting to the Hub.
Two differences I've observed
- Once
pybricksdevtries to send program length (in my caseb'O\x02\x00\x00')- Hub turned on prior to running
pybricksdevreceived the correct\x4dchecksum (escaped string"\x03\x41\x20\x08\x00\x04\x00\x04\x00\x1b\x13\x00\x4d"), and the program upload finished successfully. - Hub turned on after running
pybricksdevreceived\x00value (escaped string"\x03\x40\x20\x0c\x00\x08\x00\x04\x00\x1b\x0d\x00\x00\x00\x00\x00\x00"), and this is where all communication from/to the hub ended.
- Hub turned on prior to running
- Source and destination addresses
- Hub turned on prior to running
pybricksdevis identified asTexasIns_fa:c7:28 (prime_hub)in Wireshark - Hub turned on after running
pybricksdevis identified asTexasIns_fa:c7:28 ()in Wireshark
- Hub turned on prior to running
The blocking flash stuff on startup is a likely offender. To find out, try downgrading the firmware to this one or something older. This commit is from just before merging the new flash code.
The blocking flash stuff on startup is a likely offender. To find out, try downgrading the firmware to this one or something older. This commit is from just before merging the new flash code.
Used the latest pybricksdev together with several older commits of the pybricksdev-micropython and found the guilty one π
Four subsequent pybricksdev-micropython commits:
- 8017506 Timeouts if the hub is turned on after executing
pybricksdev run - 773c927 Code uploads but the program does not run
- 79c9f76 Code uploads but the program does not run
- d804d66 Works OK
I guess the code in bricks/stm32/main.c is causing it
static void stm32_main(void) {
#if PYBRICKS_HUB_PRIMEHUB
mp_hal_delay_ms(500);
pb_flash_init();
#endif
All of flash needs to be rewritten, not just that one in particular.
But if itβs choosing between waiting a second after boot versus no program being stored at all, Iβd rather just wait a second for now. :)
All of flash needs to be rewritten, not just that one in particular.
I don't know how this stuff works but wondering, @dlech mentioned that once the Hub is discoverable over BLE, it should be ready. So wondering if the flash init could be done before the Hub can be connected to, that would do the trick. But it might just be it's not possible this way.
But if itβs choosing between waiting a second after boot versus no program being stored at all, Iβd rather just wait a second for now. :)
As a workaround, I can add 1s sleep into the pybricksdev.cli.Run.run method, the same way the pybricksdev.cli.flash.reboot_official_to_bootloader has.
So wondering if the flash init could be done before the Hub can be connected to,
Assuming that pb_flash_init() does not call any MicroPython code, we should be able to move it to pbsys_process before pbsys_bluetooth_init().
@lobodpav, can you please test https://github.com/pybricks/pybricks-micropython/commit/e8846409af0b1d82c3bd66a914ed181503a71438 when it is finished building?
@lobodpav, can you please test pybricks/pybricks-micropython@e884640 when it is finished building?
Tested it now. Something is wrong - after writing the firmware the 3x3 LED square keeps on flashing and the Hub's main button light is off.
An attempt to run a program ends with RuntimeError: disconnected during operation.
I see now that the "block" operations are calling MICROPY_EVENT_POLL_HOOK, so that change doesn't work.
Same issue.