FortiusAnt with BLE/Bless breaks down after 5...20 minutes; time delay appears to solve the issue
Problem: When using BLE bless between FortiusANT and Apple TV on a pi4 with debian 12, the data stream will break down after 5 to 20 minutes in seemingly random situations, and the bluetooth FortiusANT device is no longer responsive. I was able to isolate this to FortiusANT. The X interface and trainer communication stay active, just BLE breaks. Command line options used were noteably -bb -D-1. So use BLE Bless, no ANT.
When trying to record such a breakdown in the log, I used the command line options -bb -D -d bD. So use bless, use an ANT dongle and log all BLE communication to file. This was because I was using a Viiiiva for backup should the BLE break down, so that I could quickly switch to ANT+ communication. I guess this slows down execution, since we have two parallel communications and constant writing to an SD card on every BLE message.
The result was that I was able to complete a 35km cobble stone ride in the north of France (no stadium finish, bummer!) in somewhat over an hour. There was no data interruption, BLE worked as it should.
This leads me to believe that we are dealing with some sort of race condtion or thread safety issue, where BLEbless methods are called in a too quick order, where memory contents might change half way through the call or something similar. On the Bless repository there is a similar issue with overloading: https://github.com/kevincar/bless/issues/140
So I propose a rate limiting whenever calling BLEbless. Pseudocode before invoking a bless method:
if last_bless_call + 50 > time():
return
else:
last_bless_call = time()
Or maybe even something upstream in the Bless library, but I am no expert on python coding.
Hi Arno
There was no data interruption, BLE worked as it should.
That is the good news, so we know the program logic is working.
So I propose a rate limiting whenever calling BLEbless
This is already present. The main loop starts here and ends here in FortiusAnt querying the trainer, but does not send data to ANT or BLE every cycle, since that would overload the interfaces (250ms is a standard ANT cycle speed).
Check here and here to see that bless is called once in a quarter second at max through the functions SetAthleteData(), SetTrainerData(), Refresh().
This leads me to believe that we are dealing with some sort of race condition or thread safety issue
I do not doubt that there is some sort of timing/safety issue, now the challenge to pinpoint it.
You could use FortiusAnt in simulation mode (-s flag). When you start FortiusAnt with this flag, you do not need to do anything on the trainer (because the trainer simulator does it for you) and the bless interface is used in the normal way. Your CTP (Zwift/Rouvy/MyWhoosh) will see an energetically training ... simulator.
I'd be curious whether the interface then also crashes....
Please let me know and we can continue investigation.
On the Bless repository there is a similar issue with overloading: kevincar/bless#140
It appears to me there is limitted active support on the library, considering actual response on the open issues.
My plan is to change the logging to ANT+ logging and see whether that changes anything. Overall execution speed will probably still be slower than without logging, but the blebless.py logic will run at full speed (four times per second).
And I will use mywoosh since rouvy is linked to garmin, my calory counter etc. Don't want fake data in there.... Or I might just see it as a training opportunity.
If you use a USB trainer and the Bless interface, ANT logging will not do much (what do you expect to be logged?)
I'll let it log USB then.
Hi @AfromD any news or situations I can help with? (Or is the problem solved by logging the USB and simply discard the logfiles?)
Still debugging. But the weather was too nice for indoor riding. Have tried my final setup with HRM via ANT dongle and Bless to Apple TV, and with bluetooth logging on, this seems stable, at least for 45 mins. Next I will try USB logging to see whether that makes a difference (Bless code will run without logging in between, but overall execution will still include a lot of logging that way). Next, I will do simulation runs in various constellations. If the solution seems to be bless logging, then I will try to substitute that with delay code insertion to prevent a lot of SD writing, also because I can't get an orderly GPIO shutdown to work so far.
I found some things out: -Any kind of debug logging will keep bluetooth stable -Logging that doesn't log much, like error logging doesn't keep things stable -Rate limiting Bless object calls keeps things stable.
For debugging purposes, I have modified bleBless.py to rate limit bless object calls to a maximum frequency. My first test with 20Hz keeps things very stable, but FortiusAnt runs in slo-mo, with a triangle pedal stroke and missed button presses. So next, I will see if I can find a sweet spot, where the rate limiting of bless calls will only kick in spuriously. After all, it can run stable for 5 minutes without it anyway. So there are just some edge cases, maybe 1 in 1000 updates where something would go wrong.
This is going an interesting way.
If you loook in FortiusAntBody.py line 1095, you see QuarterSecond to be set, so that in the fast FortiusAnt loop, only once in 250ms messages are sent over ANT/BLE from line 1357 onwards.
I would THINK FortiusAnt never sends more messages than 4Hz, but there may be a flaw somewhere...
Please let me know.
I mean, there are several calls in blebless, like setAthletedata, set Steeringangle etc, it is those that I put some pacing in. So the update still gets called four times per second, but all these single steps where a value passes to the bless object are paced as well. So with 20Hz distirbuted over 4 quarter seconds, there is avtually only time of maximum 5 calls per quarter second, meaning that at 20 Hz the program spends a lot of time in delays. Tomorrow I will try 500 Hz and see how it goes.
Ah, so at FortiusAntBody.py at lines 1406...1414 where the mentioned calls to SetAthleteData, SetTrainerData and Refresh are done, causing self.BlessServer.update_value() to be called to fast after each other?
If that IS the case, the simulator should run into the same problem (see bleBless.py line 734...736. You can start the simulator by executing bleBless.py (main module) and then start your CTP (Zwift, MyWhoosh,...) and let it go
Yes, pretty much, but I changed the calls in blebless.py so that they will wait to make sure they don't run faster than the rate limit. So in the case of 20Hz, if one of those functions is called faster then 50ms after another, it will pause execution until 50ms are full. I also thought about simply not updating bless in such a case, but that might cause certain things to get cancelled all the time. So I decided to let execution wait, and then set it fast enough so that it won't slow anything down.
My hypothesis is that somehwhere sometimes bluetooth goes tits up because something is changed while something else is not completed yet. After all it is just a workaround, I don't have the skill or understanding of the mechanisms to find out where there is a missing precaution to prevent this. So I just try to make sure there is some minimum time between bluetooth code.
I thought about reproducing the problem, I have windows and small pi's only. I understand you use pi3? With different CPU's and libraries, the results will be different.
You can use the simulator to test the solution, instead of having to execute a Zwift run for 45 minutes yourself!
I am on a Pi 4 1GB that connects via BLE to an Apple TV v3, and via ANT+ to my HRM strap. So even a bit faster than a Pi3. And it only happens every 5-20 minutes. So maybe it also depends on something at a bluetooth driver level that sometimes it might be unexpectedly busy. I wonder whether it might also happen on Windows. But I guess many people who run on windows also run their trainer software on Windows and don't use bluetooth to connect to a mobile device.
I prefer to use Rouvy, but I need to get on the bike myself to get in shape, and I don't want to get fake data in my profile. So I'll just keep testing to see what maximum frequency will prevent it from happening. At the optimum, it should not slow down excecution unnecessarily, and because it uses time(), it won't insert any pauses if the hardware is slower.
Just wanted to say I am very happy how it all works now. When I bought the stuff, I tried to get training videos working right under TTS3, with ffmpeg for keystone adjustment and frame generation, but the hardware couldn't really do it. I couldn't go TTS4 because my beamer didn't have enough resolution. Now thanks to FortiusANT, Rouvy and cheap beamers, all my problems are solved and it works like a charm. The virtual gears was something I already wished fifteen years ago, but Tacx wouldn't listen. It has been collecting dust for the last ten years, and now it works like it always should have.
It has been collecting dust for the last ten years, and now it works like it always should have.
There is no better compliment than this one; thank you!
I don't want to get fake data in my profile
I do understand that. I do not recall whether you can delete an activity from Rouvy, which might be an option.
I used Rouvy some years ago and I liked it; but did not want to have too many TCP's and chose to use Trainer Road. I use SportTracks as data collector, Strava just for the social communication and Training Peaks for communication with the personal trainer AND the TSS-value. Enough software :-)
I think I have gotten git set up so far. Here is a preview of my code change for now, but not yet ready to merge: https://github.com/WouterJD/FortiusANT/compare/master...AfromD:FortiusANT:master
If I understand right, a 2ms delay is enough...
Yes, with the current setting it will add a delay of up to 2ms in case 2ms haven't expired since the last bless object invocation. I tested it today for an hour, and it was stable, and there was no lag in the pedal stroke analysis or shifting.
I am no expert on python coding
Learning on the job :-)
there was no lag in the pedal stroke analysis or shifting
Likely since it's just some delays of (max) 2ms per 1/4 second.
I just specified what the code should do. The actual coding was ChatGPT. As long as you specify exactly what you need, it produces workable code. I know a bit about programming, but I would need quite some time to figure that delay function out for myself, especially the nested function calling.
Great way to learn coding. Did ChatGPT rewrite the whole BleBless.py (or only suggest the approach)?
None of that code was written by me, but I told ChatGPT what it should do. So the mechanism was my idea, but the execution is beyond my capabilities. I am at a level where I can read and understand most coding in most languages, but I just have superficial knowledge of syntax. So in the past, my programming consistent 90% of googling. Now it is 50% writing the specification, 20% implementing Chatgpt code and 30% checking that everything makes sense. But overall I am a lot more effective.
haha, I'm not that far yet with ChatGPT. With FortiusANT I started with Notepad++ since I did not know VSC et al.
What kind of questions do you feed into ChatGPT to get this result?
You are a python code reviewer. I have a bit of code here where interaction with the bless object sometimes causes the bless object to become unresponsive. Therefore, I would like to add some code to ensure that the bless object is called maximum 20 times per second, independent of the context it is called in. What changes would you suggest? Use a timer. Make sure your code is efficient and maintains the structure of the rest of the code
and then the original code.
bleBlessClass.py and bleBless adjusted to implement this delay. Further comment in the pull request