support icon indicating copy to clipboard operation
support copied to clipboard

[Bug] Hub can get stuck while broadcasting

Open JJHackimoto opened this issue 1 year ago • 98 comments

Describe the bug Block Coding: When using Broadcast in two or more tasks running simultaneously, an error will be thrown and the program will terminate. Error below:

"OSError: This resource cannot be used in two tasks at once."

Expected behavior I expected the block coding to give me an error or prevent me from building the program this way before running the program. This is especially true since block coders may not be that experienced in coding, and may not understand the error thrown in the console. The error won't be seen if running the program off the hub without a bluetooth connection either.

I could also expect this to just work since I personally can't see the reason for this not working. It currently feels unintuitive that the only way to use Broadcasting is to have a loop that constantly sends out the chosen values instead of just sending an update when the values have actually changed (for example when just sending out Booleans once in a while).

JJHackimoto avatar Feb 04 '24 18:02 JJHackimoto

Thanks for raising this. This is the expected behavior (as in, not a bug), but I can see this being not ideal. We could perhaps improve this with documentation and a clear example.

I could also expect this to just work since I personally can't see the reason for this not working.

A fair analogy is screaming two different things from the same rooftop at the same time :smile: So even if we made the error go away, this probably isn't going to work as you'd like. The receiver wouldn't be guaranteed to receive both.

To send two values, it's better to send them together in a list. So if your code wants to send variables A and B from two different tasks, you could make a another task that just broadcasts a new list of A & B whenever either of them changes.

Generally in Pybricks, we try to raise the error when there is one, instead of not telling you and leaving you confused why something isn't working.

I expected the block coding to give me an error or prevent me from building the program this way before running the program.

We couldn't know in advance how the program might run. It's perfectly fine if two tasks both use broadcasting, just not at the same time.

laurensvalk avatar Feb 04 '24 19:02 laurensvalk

I see, thanks for the clarification. Now, I'm not sure how this could work, but wouldn't it be possible to make broadcasting always happen asynchronously when using block coding? This way, it would never occur at the same time no matter where in the program the call is used.

Feel free to close this issue if you decide that no changes are needed for this :)

JJHackimoto avatar Feb 04 '24 19:02 JJHackimoto

That way, you'd still 'drown' one message in the other, so overall responsiveness is likely not as good compared to intentionally combining the messages as needed depending on your application.

laurensvalk avatar Feb 04 '24 20:02 laurensvalk

Let's keep the issue open since it's definitely a good question. We'll want to document this clearly and explain why =)

laurensvalk avatar Feb 04 '24 20:02 laurensvalk

In addition to this, is it true that broadcasting and unpacking cannot be done at the same time as well? I've had a few issues with hubs freezing where the blue light continue to fade as normal but hub is stuck and only thing to do is to power it down with a long press. Short press won't work. After moving all Bluetooth communication blocks to a single task that runs repeatedly, the issue seem to have gone away.

JJHackimoto avatar Feb 18 '24 16:02 JJHackimoto

That should be allowed. If you find a reproducible small program we can test, that would be very useful!

laurensvalk avatar Feb 18 '24 18:02 laurensvalk

I did find https://github.com/pybricks/support/issues/1454. Maybe you were seeing this too?

laurensvalk avatar Feb 18 '24 18:02 laurensvalk

That one is interesting. Did your hub continue to fade blue during and after the "crash"?

I've yet to have this happen while connected to a computer, and it's really inconsistent in when it happens, sometimes after a minute, and sometimes after 10. Other times it doesn't happen at all during the time I'm testing with my program. I'm still using three hubs communicating with each other and this far, two of them have randomly "crashed", one more often than the other. The third hub has been fine all the time. The difference between how they handle communication is that the good hub has been doing it in a loop in a separate task. The other two have only been unpacking in a loop in a separate task, while broadcasting from the main program whenever needed. I've since moved it all to that separate loop and it seems to have been fine since then on all three hubs.

I'll see if I can reproduce it with a small program.

JJHackimoto avatar Feb 18 '24 19:02 JJHackimoto

That one is interesting. Did your hub continue to fade blue during and after the "crash"?

No, so maybe you're seeing something different.

I'll see if I can reproduce it with a small program.

Thank you!

laurensvalk avatar Feb 19 '24 07:02 laurensvalk

I wasn't able to reproduce it with a small program sadly. However, I now know that it's actually not due to broadcasting and unpacking at the same time since one of my hubs did this again today, even though the program has all broadcasting and unpacking in a single task. I now have no clue what can be causing this.

JJHackimoto avatar Feb 20 '24 20:02 JJHackimoto

Just to add to this, same happened today, but long pressing the power button made the light on the hub flash rapidly without stopping. The hub stopped responding to long button presses altogether and the only way to revive the hub was to pull the batteries.

JJHackimoto avatar Feb 24 '24 13:02 JJHackimoto

Which firmware are you using?

from pybricks import version

print(version)

The beta firmware from https://beta.pybricks.com/ should already fix some of this, so it would be good to know which version you used.

laurensvalk avatar Feb 24 '24 13:02 laurensvalk

I'm running: ('technichub', '3.4.0b2', 'v1.20.0-23-g6c633a8dd on 2024-02-14')

One of my hubs are probably on an older firmware. I read something about bad data in the thread you linked. I've been holding off updating the firmware since it fails 95% of the time ("The hub took too long to respond. Restart the hub and try again."), taking up to 30 minutes until I can get it running. I'll update the third hub now and we will see if the issue comes back. Thanks for letting me know there's fixes for this in the update :)

JJHackimoto avatar Feb 24 '24 14:02 JJHackimoto

All hubs are now updated but the issue still occurred. This time I also had to pull the batteries to get the hub turned off.

Here's the program I'm running on the most problematic hub (Code is generated through Block Coding).


from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor
from pybricks.tools import multitask, run_task, wait

Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)

SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
LateTrigger = ColorDistanceSensor(Port.C)
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)

DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False

async def main1():
    global Triggered
    while True:
        await wait(0)
        while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
            await wait(1)
        Triggered = True
        await wait(1000)
        Triggered = False

async def main2():
    global DistributorTipp, TriggeredDistributor, TableReadyForTipp
    while True:
        await wait(0)
        DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
        TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
        await SensorHub.ble.broadcast([Tipped, Triggered])
        await wait(200)

async def main3():
    global Tipped
    while True:
        await wait(0)
        if DistributorTipp == True and TableReadyForTipp == True:
            await Tipping.run_angle(100, 100, Stop.BRAKE)
            await wait(1000)
            await Tipping.run_angle(100, -100, Stop.BRAKE)
            Tipped = True
            while not (DistributorTipp == False and TableReadyForTipp == False):
                await wait(1)
            Tipped = False
        else:
            pass
        await wait(500)


async def main():
    await multitask(main1(), main2(), main3())

run_task(main())

JJHackimoto avatar Feb 24 '24 16:02 JJHackimoto

The documentation issue here has been addressed via https://github.com/pybricks/pybricks-api/commit/b2b183ef8f2e027f61525b77146c03c12b24d3fa.

What remains here then is the issue of the hub getting stuck. I haven't been able to reproduce this yet.

laurensvalk avatar Mar 05 '24 13:03 laurensvalk

Great!

I attended a lego-event this weekend and had my machine running for a day straight. The stuck hub issue happened about once every 30 minutes on one of the three hubs at random. Sometimes having to pull the batteries and sometimes not. I still don't know what could be causing this.

JJHackimoto avatar Mar 05 '24 16:03 JJHackimoto

Just to update here. Using v3.5.0b1 (Pybricks Beta v2.5.0-beta.2) with the latest firmware still causes these crashes. There is a difference though, when it happens, I always have to pull the batteries in comparison to before where that was the odd case.

JJHackimoto avatar Mar 24 '24 16:03 JJHackimoto

Is it still this program that causes it for you?

I'd like to make some time to properly investigate this one. As a first step, I'd like to try to reproduce it.

Do you think we can make something with a hub and just a few motors, without replicating your whole build?

Are you only transmitting boolean values? What does your other program look like? Or can the crash be reproduced by just running this one?

Thanks!

laurensvalk avatar Mar 25 '24 08:03 laurensvalk

Is it still this program that causes it for you?

I'd like to make some time to properly investigate this one. As a first step, I'd like to try to reproduce it.

Do you think we can make something with a hub and just a few motors, without replicating your whole build?

Are you only transmitting boolean values? What does your other program look like? Or can the crash be reproduced by just running this one?

Tried the program you referred to on a Technic hub with one ColorDistanceSensor (I have only one ColorDistanceSensor) and a ColorSensor.

The program runs over two hours without a problem. Firmware: v3.5.0b1

Two hubs run a transmitter:

one on a primehub:


from pybricks.hubs import PrimeHub
from pybricks.tools import wait
from urandom import choice

transmitter = PrimeHub(broadcast_channel=2, observe_channels=[1, 3])

while True:
    transmitter.ble.broadcast([choice([True, False])])
    TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    wait(100)

And one on a Technichub:


from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from urandom import choice

transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])

while True:
    transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
    TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    wait(100)

But as stated, no problem seen yet. Maybe @JJHackimoto has other experience or test program.

Bert

BertLindeman avatar Mar 25 '24 12:03 BertLindeman

Maybe a strange hit here..... The transmitting techhub with the small test program above, running disconnected from the PC, started to complain about low-battery. So long pressed the button, to get a rapid flashing hub. And need to take the batteries out. (😁 needed that anyway to re-load them)

Would battery-low have caused this?

[EDIT] The situation occurred after over four and a half hour.

New batteries IN and press button, normal blinking, OK. Press again and I get NOT a running program. Could it be that the loaded program is erased in this situation? Needed to re-load the small test program. And now it runs nicely again.

BertLindeman avatar Mar 25 '24 15:03 BertLindeman

@Bert - thanks for testing!

So long pressed the button, to get a rapid flashing hub. And need to take the batteries out. (😁 needed that anyway to re-load them)

I thought this was fixed but apparently not. Can you add your findings to https://github.com/pybricks/support/issues/1497 ?

The situation occurred after over four and a half hour.

Which situation? The stuck program? With your smaller test program? If yes, that's great news - will help debugging quite a lot. :slightly_smiling_face:

Could it be that the loaded program is erased in this situation?

Programs are saved during normal shutdown. So if you pull the batteries after loading a new program, it won't be saved.

laurensvalk avatar Mar 25 '24 15:03 laurensvalk

The situation occurred after over four and a half hour.

Which situation? The stuck program? With your smaller test program? If yes, that's great news - will help debugging quite a lot. 🙂

The test for 'this' item ran for 4,5 hours and so also the crashing small program. The small program seemed to run "normally" and started flashing orange: battery-low. Pressed the button to stop it. Than fast-flashing.

Could it be that the loaded program is erased in this situation?

Programs are saved during normal shutdown. So if you pull the batteries after loading a new program, it won't be saved.

Ah, I should have thought about that. I probably disconnected the TechnicHub and did not stop the program, so it was not saved.

Will add the findings to #1497

BertLindeman avatar Mar 25 '24 15:03 BertLindeman

Hi and thanks Bert for testing! I'll do my best to help out on this since it's a huge issue on my side. It happens on any of my three hubs running together, and there seems to be no reason as to when or which hub crashes. One of the hubs runs off batteries, the other two are plugged in permanently using a battery eliminator made for these hubs from PV-Productions. How do you know when it complains on low-battery? Can you see this when the hub runs disconnected from any computer? I always run my hubs that way since having three of them connected wouldn't make much sense in my case.

I am mostly broadcasting boolean values, but I have started broadcasting single integers on a few occasions. Nothing massive :)

I have changed the programs a bit since I last posted them here. Mostly to reduce the amount of broadcasting being done. I'm happy to take any feedback you might have if you see something obvious that I'm doing wrong. Here's the three programs I'm currently using.

pybricks-backup.zip

The essential stuff for testing this would probably to have at least two technic hubs, one with a color/distance sensor reading for either color or distance to approach a certain value, which would in turn change a variable and that variable is broadcasted every 200ms or similar. I'd love to have a block that could broadcast "OnChange" of a variable, but this is how I currently do it. The other technic hub can then receive that, maybe wait for 10 seconds and then broadcast another variable. I believe that would be the essentials.

Let me know if I can do something more to help :)

JJHackimoto avatar Mar 25 '24 17:03 JJHackimoto

I am mostly broadcasting boolean values, but I have started broadcasting single integers on a few occasions. Nothing massive :)

The Technic Hub has some issues when changing between large and small values. See https://github.com/pybricks/support/issues/1454.

To rule this out, keep your values between -127 and 127. Booleans are fine too. And try to always send the same kind of list. For example always (small number, bool, small number).

Mostly to reduce the amount of broadcasting being done.

Since recent versions, you can now also broadcast None to disable broadcasting.

Let me know if I can do something more to help :)

Thank you. I'll try to look at your programs later. The smaller we can make them, the better :)

laurensvalk avatar Mar 25 '24 17:03 laurensvalk

How do you know when it complains on low-battery

The hub led will blink orange with the original color between the blinks.

BertLindeman avatar Mar 25 '24 18:03 BertLindeman

To rule this out, keep your values between -127 and 127. Booleans are fine too. And try to always send the same kind of list. For example always (small number, bool, small number).

I see. Yeah if I use integers, it's only for sending either a "1", "2" or "3". So no more than a single integer at a time. :)

Since recent versions, you can now also broadcast None to disable broadcasting.

Alright, I'm not sure where this could be useful for me, but it's good to know! Actually, that makes me wonder, if I broadcast a value once, how long will other hubs be able to pick it up for?

Thank you. I'll try to look at your programs later. The smaller we can make them, the better :)

Yeah they are really big and complicated. The new comment blocks will help massively once I start adding them :)

The hub led will blink orange with the original color between the blinks.

Oh alright. I've never seen that happen yet so I probably don't have that issue at least :)

JJHackimoto avatar Mar 25 '24 18:03 JJHackimoto

I think I was able to reproduce once, after a long time.

In your experience, is this reproducible if only one hub is broadcasting? Or is it more likely to happen when another hub is also broadcasting?

laurensvalk avatar Mar 26 '24 13:03 laurensvalk

I wanted to change the program in the technichub and noticed it got stuck in half an hour. At that time the large program also ran.

Now: At the moment I have running the large program only, so no other broadcasters. Added code to color the led hoping the problem still occurs and that I can see the light goes steady at the moment the problem occurs. Fingers crossed...

BertLindeman avatar Mar 26 '24 14:03 BertLindeman

Gotcha. Running only the large program (adapted to SEE that it got stuck) in almost an hour. There were no other transmitters.

The steady color was red, see video below. The color.red command is after an observe and before a broadcast.

The changed program so see that it goes wrong and at what command:


from pybricks import version
from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor, ColorSensor
from pybricks.tools import multitask, run_task, wait
from urandom import choice

Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)

SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
# LateTrigger = ColorDistanceSensor(Port.C)
# LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
LateTrigger = ColorSensor(Port.C)  # I have only ONE colorDistanceSensor so a ColorSensor
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)

DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False


async def main1():
    global Triggered
    while True:
        await wait(0)
        while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
            await wait(1)
        # print(end="1")
        Triggered = True
        await wait(1000)
        Triggered = False


async def main2():
    global DistributorTipp, TriggeredDistributor, TableReadyForTipp
    while True:
        await wait(0)
        DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
        SensorHub.light.on(Color(180, 100, 50))  # Cyan

        TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
        SensorHub.light.on(Color(0, 100, 50))  # red

        await SensorHub.ble.broadcast([Tipped, Triggered])
        SensorHub.light.on(Color(60, 100, 50))  # Yellow

        # SensorHub.light.on(Color.NONE)
        await wait(200)
        SensorHub.light.on(Color(120, 100, 50))  # green


async def main3():
    global Tipped
    while True:
        await wait(0)
        # print(DistributorTipp, "d", TableReadyForTipp, "t", end="")
        if DistributorTipp == True and TableReadyForTipp == True:
            await Tipping.run_angle(100, 100, Stop.BRAKE)
            await wait(1000)
            await Tipping.run_angle(100, -100, Stop.BRAKE)
            Tipped = True
            while not (DistributorTipp == False and TableReadyForTipp == False):
                await wait(1)
            Tipped = False
        else:
            pass
        await wait(500)


async def main():
    await multitask(main1(), main2(), main3())

print(version)

run_task(main())

https://github.com/pybricks/support/assets/8142081/d96f9acc-c3ea-4c7c-901c-ba984b15ccdf

BertLindeman avatar Mar 26 '24 15:03 BertLindeman

Good testing!

I've not tested with only one active hub, but your test shows that the issue can still occur. The video you have show the exact same sequence as I experience.

Note that if the program doesn't alter the light on the hub, it continues to fade as normal even though it has gotten stuck. So it's not visible then if the program is stuck or not, until you press the button and realize nothing happens unless long-pressing. And that's where it rapidly blinks and you'll have to take the batteries out.

JJHackimoto avatar Mar 26 '24 16:03 JJHackimoto