ustreamer
ustreamer copied to clipboard
Separate thread for rtp_wrap_h264/relay, detecting missed writes
I finally got around to this. What do you think? The separate thread is off by default. Turning it on definitely makes missed written frames much less likely, which is helpful when frames are bigger or at higher frequencies.
I also added two fields to the shared memory structure that make it easy to detect when a write is missed even though there are clients. It then prints a line to the log when it happens, which is really helpful for detecting the issue, versus watching encoded video and trying to spot it visually.
Sup. I'm very busy right now, so I'll look at it later. Sorry.
Of course, I can imagine things are quite busy especially with the Kickstarter launch (I'm eagerly in that queue. :-) ) Thanks for the update.
JFYI I don't forget about it.
I was completely exhausted, but I didn't forget about it. If you are not disappointed in the speed of my work yet, could you remind me a little bit about why we can't just increase the polling rate?
Of course. I had to go back and remember why too. The details were in https://github.com/pikvm/ustreamer/issues/115 but were mixed in with a lot of other details on other stuff too. ;-)
The core of the issue is that even if we increase the polling rate, the time taken by invoking the Janus callback g_gw->relay_rtp
can spike the time taken by the code to 20 ms or even higher. So even if we drop the polling rate to something really low like 5ms, the spikes from the Janus callback can make us miss frames. Having several (15) browser windows all pointing to the same video feed is a good way to force the issue and see the effects even with the standard ustreamer setup/encoding.
Pushing the rtp_wrap_h264
calls to another thread and using a ringbuffer to communicate with it means that _clients_thread
is now only responsible for reading the memory sink, and thus missing a write to the sink becomes significantly less likely. As long as the average time for rtp_wrap_h264
in the other thread is less than the video framerate, and the ringbuffer has sufficient capacity, we won't drop any frames, even if there are occasionally large spikes in rtp_wrap_h264
time.
We can make it even simpler and potentially more scalable. What do you think if relay_rtp()
for each client will work in its own separate thread?
I think that would be fine, and it does seem like there would be some tradeoffs. Would each thread have its own ringbuffer? Memory usage would be higher and the time in _clients_thread
would scale linearly with the number of clients as memory was copied to each ringbuffer. But a big positive is that one poorly behaving client would not be able to ruin the others.
I would expect the time for the memory copies to each ring buffer would be small enough that even scaling linearly with # of clients would be fine.
I plan to use a regular queue, as it is now done in audio.c
. But yes, every client will have their own queue.
I plan to use a regular queue, as it is now done in
audio.c
. But yes, every client will have their own queue.
Sounds great to me. :-)
Okay, I'll implement it and you can test it.
Okay, I made it. Could you test? See the master commits.
Okay, I made it. Could you test? See the master commits.
Of course. I just got back from being out and it's been a while since I've played with my external encoding setup, so it may take a few days until I get to it, but it's on my list to take care of soon.
Cool. Today I commited the last thing so I believe everything should work now.
Hey just FYI this is still on my todo but I haven't gotten to it yet. It may be a few weeks if that's not too much trouble. I don't currently have my pikvms set up for active dev, so it'll take me a few hours to get everything up and test, and finding those hours is tight at the moment.
If validating this becomes urgent on your end just let me know and I'll try to get it done sooner. Thanks again for keeping it in mind and extending the audio solution to solve the h264 issues I had as well.
There are no problems and I'm not in a hurry with this. My recent fixes with queues have made it possible to stream to 40 clients. Before the queues were implemented, there were a lot of frame skips. I guess I solved your problem too.
Boop :)
Sorry, thanks for the reminder. I'll try to get this done in the next couple days.
Hoped to get to it today, but ran out of time. By Tuesday at the latest; thanks for the patience.
Finally got a chance to play with this and validate today. The code looks great, and the performance is good.
There's definitely an increase in lag as the number of clients gets big, but that's not really avoidable, there's only so much a pi can do. :-) I had 20 clients streaming a 1080p animated video without any corruption and it was still totally usable, just with a bit more lag. That's quite impressive.
Thanks for fixing this! :-)
Great! Thank you for your work here, you have helped me a lot.