mycroft-core icon indicating copy to clipboard operation
mycroft-core copied to clipboard

Prevent critical failure of audio services

Open krisgesling opened this issue 2 years ago • 11 comments

Description

If an audioservice throws an exception when starting playback, the service_lock is never released. Playback cannot be initiated again until the Audio Service is restarted.

I haven't yet found exactly what's failing, and hence how to replicate the issue. Supposedly VLC was not entirely crashing, instead hanging and blocking forever in one of three methods in self.play():

        selected_service.clear_list()
        selected_service.add_list(tracks)
        selected_service.play(repeat)

Thanks to @JarbasAI @HelloChatterbox for catching this.

How to test

Need to find a way to reliably reproduce it first...

Contributor license agreement signed?

  • [x] CLA

krisgesling avatar Aug 31 '21 12:08 krisgesling

Hello @krisgesling! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-08-31 20:09:49 UTC

pep8speaks avatar Aug 31 '21 12:08 pep8speaks

The context manager for the lock handles the exception automatically so the explicit exception handling is not really necessary. The change looks good though, handling broken service objects :+1:

Buuut if there is a hang somewhere neither context manager or explicit exceptions would help...

forslund avatar Aug 31 '21 13:08 forslund

Voight Kampff Integration Test Succeeded (Results)

devops-mycroft avatar Aug 31 '21 14:08 devops-mycroft

@JarbasAI mentioned the VLC version in chat. It may be useful to update the requirement, it's been a long while.

forslund avatar Aug 31 '21 15:08 forslund

@JarbasAI mentioned the VLC version in chat. It may be useful to update the requirement, it's been a long while.

Yeah, thought I'd do that in a separate PR. Potentially pull the service out to a plugin at the same time.

krisgesling avatar Aug 31 '21 20:08 krisgesling

@JarbasAI mentioned the VLC version in chat. It may be useful to update the requirement, it's been a long while.

Yeah, thought I'd do that in a separate PR. Potentially pull the service out to a plugin at the same time.

It might be the place to start. 1.1.2 was released in 2015 (https://pypi.org/project/python-vlc/1.1.2/ ) and the most current release was April 2021.

mcdonc avatar Sep 13 '21 22:09 mcdonc

FWIW, I locally updated python-vlc to the latest (python-vlc==3.0.12118), changed the default audio backend to vlc, and it indeed played the news and passed audiobackend unit tests successfully at least using the newer lib. I know this doesn't mean much for testing purposes, as the unit tests use a totally mocked vlc; I'll see if I can get the VK tests running in a config that prefers to use vlc.

Another potential workaround for hangs would be to run the service processes using Supervisor (http://supervisord.org). It has an event system that can be hooked to control processes that run under it. If, say, each service emitted some sort of "I'm-still-alive" event every so often, a supervisor event listener could be written that, were it to not receive such an event within some timeframe, it could kill and restart the service. A example of such an event listener is "superlance", which attempts to get a 200 OK from an http server every so often, and if it doesn't, will restart one or more processes that are run under supervisor. https://github.com/Supervisor/superlance/blob/master/superlance/httpok.py .

Anyway, introduction of supervisor would be a bit invasive, but its introduction might be practical in the face of buggy libs, tough-to-test edge cases and whatnot.

mcdonc avatar Sep 14 '21 20:09 mcdonc

I created a branch in a fork of mycroft-core to document attempting to run the vk tests using the newer python-vlc lib version here: https://github.com/MycroftAI/mycroft-core/compare/dev...mcdonc:feature/newpythonvlc?expand=1

These tests passed, FWIW.

mcdonc avatar Sep 15 '21 19:09 mcdonc

And here's another branch as a proof of concept to start "all" services under supervisor, just in case it interested you.

https://github.com/MycroftAI/mycroft-core/compare/dev...mcdonc:feature/supervisorservices?expand=1

It does not yet implement an eventlistener to restart hung services, but it would restart services when they crash. It also rotates service log files as necessary.

mcdonc avatar Sep 15 '21 20:09 mcdonc

Thanks @mcdonc, I hadn't seen supervisord before, and haven't had a good look through the package yet, but we had been thinking about whether to go down a similar path: https://docs.google.com/document/d/1REXWaUt1HUrbyaunB09cfQ435krdNwhjVbMVVG3iNKs/edit#

We haven't started any work on it yet, and a big question for me is how much we can leverage what systemd already does vs what we need to control in a different way etc.

I'd be interested in any thoughts you have on it.

krisgesling avatar Sep 29 '21 04:09 krisgesling

systemd user services would work just as well to restart crashed services, I think. supervisord was written before systemd existed, and has been maintained since then pretty well; I have used it on many projects but haven't really kept up with features added to systemd that it didn't have during that period. In particular, I don't know if systemd has the equivalent of supervisor's event listener system (http://supervisord.org/events.html) and log rotation. If it does, I'd go with systemd, unless it's of some maintenance benefit for the supervisor to be wrirtten in Python. Full disclosure: I am the original author of supervisord.

In any case, having some parent process restart crashed services would be a good start. I'm also not sure if the current configuration rotates log files; that can be a pain to configure in Python logging.

But having read your spec document, I definitely would not write another bespoke process manager.

EDIT: I looked at Jinx's comment to your process spec document and it seems that he has created a set of systemd units for Mycroft at https://github.com/j1nx/mycroft-systemd that looks promising by its description. I don't quite understand the "sd_notify" call nor the "software watchdog" he mentions in the document, but it likely covers most of what supervisor's event listener system might for you.

mcdonc avatar Sep 29 '21 11:09 mcdonc