homebridge-camera-ffmpeg icon indicating copy to clipboard operation
homebridge-camera-ffmpeg copied to clipboard

Two-way audio based on SIP

Open nanosonde opened this issue 4 years ago • 32 comments

@Sunoo @longzheng After reading the two-way audio issue, I think it is worth to create a seperate follow-up issue which only refers to video doorbells that provide two-way audio based on SIP. (See former discussion here: https://github.com/Sunoo/homebridge-camera-ffmpeg/issues/738#issuecomment-680557506)

Example devices:

Video is very often just implemented as a MJPEG or H.264 stream via HTTP/RTSP. I guess there are also SIP video doorbells which offer video as part of the SIP session. In the former case, the video is normally completely seperate from the audio part via SIP/RTP. In the later case, I would assume that the video+audio uses SIP early media feature to show video+audio before actually picking up the call (during SIP RINGING).

I think that the homebridge-ring plugin could probably serve as a good starting point as it shows how to implement the SIPclient based on @kirm sip.js lib. It should be easy to then extract the relevant SDP from the SIP INVITE media negotiation.

Remark: Of course there are a few SIP apps out there which could also somehow cover the use case and also offer Apple VoIP notification feature (like linphone) to receive calls even if the app is not in the foreground. However, this feature request is to use SIP video doorbells as Homekit video doorbells without any additional app. Only homebridge will talk to the SIP video doorbell.

nanosonde avatar Dec 08 '20 10:12 nanosonde

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 15 '20 11:12 github-actions[bot]

I’m really not sure if SIP is appropriate for this plugin. If it was to be added, it would be a ways down the line.

I also don’t have any cameras that support SIP, so development and testing would be a bit difficult.

Sunoo avatar Dec 15 '20 17:12 Sunoo

After investing a bit more and looking at various Homekit projects, I came to the conclusion that it is really out of scope of this plugin.

So I think I will go this way:

  • install something like linphone-nogtk to establish and receive SIP calls to/from the SIP video doorbell.
  • configure the SIP client to use ALSA loopback devices or similar to get the full-duplex audio on two diffrent lookback cards
  • use this plugin with two-way audio: use each loopback device in each FFMPEG process for each direction

Do you think that this is feasible?

nanosonde avatar Dec 17 '20 15:12 nanosonde

Seems like it should be yes. If you write up your results, I'll be happy to point people towards that if they want to do a similar thing.

Sunoo avatar Dec 17 '20 15:12 Sunoo

Ok, I will close it for now.

nanosonde avatar Dec 17 '20 15:12 nanosonde

Ah, sorry. One question that should fit in the scope of this plugin.

Could you please provide a "returnAudioTarget" command line for FFMPEG which just sends the the FFMPEG output to a local sound card?

nanosonde avatar Dec 17 '20 15:12 nanosonde

Sure, I have one in my notes, I'll dig it up this afternoon.

Sunoo avatar Dec 17 '20 15:12 Sunoo

@Sunoo In the mean time I have read a lot about ffmpeg/gstreamer RTP, SRTP with AVP(F)/SAVP(F), ALSA, Pulseaudio, SIP, baresip and so on.

My approach using ALSA loopback device and some SIP client (I use baresip) seems to be quite promising during my first experiments.

I have used this ffmpeg config as a starting point:

ffmpeg -f mjpeg -r 15 -i http://192.168.10.22:8080/?action=stream \
 -f alsa -i hw:1,1\
 -vcodec libx264 -x264-params keyint=25:min-keyint=25 -f rawvideo -preset ultrafast -tune zerolatency -payload_type 99 -ssrc 16132552 -an -sn -dn -flags global_header \
 -f rtp "rtp://192.168.10.101:58536?rtcpport=58537&localrtcpport=58537&localrtpport=58536&pkt_size=1316" \
 -acodec libfdk_aac -profile:a aac_eld -flags +global_header -payload_type 100 -ssrc 17132553 -ar 48000 -ac 2 -vn -sn -dn \
 -f rtp "rtp://192.168.10.101:58538?rtcpport=58539&localrtcpport=58539&localrtpport=58538&pkt_size=1316" \

I have loaded the ALSA loopback module: sudo modprobe snd-aloop

Now I have installed baresip-core in Ubuntu. In the baresip config under ~/.baresip/config I setup the audio config like this:

audio_player		alsa,hw:1,0
audio_source		alsa,hw:1,0
audio_alert		alsa,hw:1,0
ausrc_srate		48000
auplay_srate		48000
ausrc_channels		2
auplay_channels         2

So baresip will play and record the SIP audio to the one and only loopback device. It is full-duplex, so it will work simultaneously with playback and capture.

The ffmpeg command line from above received what is played from baresip on hw:1,1 and streams it via RTP. BTW: I use VLC without SRTP for testing at the moment.

To test "return audio" I played an MP3 file: mpg123 -a hw:1,1 test.mp3 The doorbell gave back the audio without any problems.

So what that all mean to this plugin? What I would require is a PRE and POST hook before/after the FFMPEG invocation to be able to setup everything and take it down again. Especially when using ALSA loopback it is important that "problematic programs" open the loopback device FIRST so that they can freely setup sample rate, number channels, sample format, etc. Another user of the loopback device -in our case ffmpeg - would have to "live" with the configured settings. However, this is not a problem for ffmpeg as it can convert audio to whatever is required.

Do you think you could add some pre/post script execution config commands that get executed?

nanosonde avatar Dec 23 '20 17:12 nanosonde

This is not the first use case that’s come to me that could use pre or post execution jobs. I have some ideas as to how to implement that somewhat cleanly. I’ll probably work on it after Christmas. There is another version I need to push before I dive into that, but that one shouldn’t be too hard.

Sunoo avatar Dec 23 '20 17:12 Sunoo

@nanosonde Would something like one of these options resolve your use case? https://github.com/Sunoo/homebridge-camera-ffmpeg/issues/929#issuecomment-782941672

I’m still giving some thoughts on how best to handle this sort of thing.

Sunoo avatar Mar 02 '21 03:03 Sunoo

@Sunoo

I have read your suggested options.

The issue we should consider for option 2 and 3 is that we need some kind of handshake BEFORE the actual ffmpeg process is started. This is required because it shall be possible to setup loopback devices that ffmpeg shall use when started afterwards. So the plugin would have to wait for some ACK, before it proceeds to start ffmpeg. Maybe with some default timeout in case the external script does not work properly.

If I would have to choose, I would go with MQTT instead of HTTP. I have a broker running anyway. I guess that people who need an advanced setup with external scripts should be able to handle the broker requirement.

nanosonde avatar Mar 02 '21 10:03 nanosonde

@startuml
Plugin -> Script: Request Prepare Resource
Script--> Plugin : ACK

Plugin --> Plugin : Use resource (e.g. audio device as input device in ffmpeg)

Plugin -> Script: Request Shutdown Resource
Script--> Plugin : ACK
@enduml

grafik

nanosonde avatar Mar 02 '21 10:03 nanosonde

Hmm, good point on the ACK, hadn’t thought about that. There would probably have to be a fairly short timeout on waiting on a response from the script if I did wait for a response.

Also, some scripts that hook into this probably would likely have no reason to delay the stream, but I suppose either a configuration setting or just documentation that they should ACK immediately in that case should solve that.

Just thinking out loud a bit, but if it would just be two way audio that would need the ACK, I wonder if there is a reasonable way to start sending return audio towards your script and just have it pick that up and start working with it as soon as possible. This would have the best user experience, since loading the video wouldn’t be delayed, it may just take a second or two for return audio to start working after it loads. Configuring the two way audio setting in the plugin to point at a FIFO or something could be the solution for that.

Sunoo avatar Mar 02 '21 13:03 Sunoo

What I would like to do is use an existing SIP command line client to handle the SIP communication with two-way audio. As I do not want to maintain the SIP part.

If the PrepareResourceRequest comes in, I would like to start the SIP command line client,return immediately and send the ACK towards the plugin. This will make sure that it already grabbed the corresponding ALSA devices. In parallel the the SIP client already starts initiating the SIP call which the plugin can already start the ffmpeg processing to get the video stream as early as possible.

So the timeout could really be fairly small I think. Just enough time to start another linux process which opens some device files.

nanosonde avatar Mar 02 '21 16:03 nanosonde

BTW: I am not sure if we need an ACK during shutdown. It will be called AFTER the ffmpeg process is finished. So no delay is necessary here.

nanosonde avatar Mar 02 '21 16:03 nanosonde

I totally understand not wanting to maintain a SIP implementation, that's not my idea of fun. I'm going to keep thinking on this. Perhaps trying to come up with a one-size-fits-all solution as I have been isn't worth it. Though delaying video until the ACK also allows for the potential need to set some process up to pull video from as well.

I'm not sure how patient HomeKit is when waiting for frames, I'll have to do some testing at some point (might be the same ~22 seconds that it waits for snapshots). Obviously any delay negatively impacts user experience though, and should be avoided where possible.

Also, I agree, no ACK is likely required on shutdown, as there is no reason (and in many cases, no ability) to delay stopping the stream.

Sunoo avatar Mar 02 '21 17:03 Sunoo

Maybe someone will be useful. I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

spbroot avatar Oct 12 '21 13:10 spbroot

Maybe someone will be useful. I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

Which SIP intercom are you using?

nanosonde avatar Oct 12 '21 13:10 nanosonde

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Sunoo avatar Oct 12 '21 14:10 Sunoo

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Couldn't the HTTP calls to answer and hangup be executed as part of the pre- and post-hooks when running the FFMPEG process?

So I think it is enough if the camera-ffmpeg plugin just calls a webhook before and after executing FFMPEG without waiting for a reply.

The audio loopback device can always be opened by FFMPEG. The video stream is always there for most SIP intercoms where the video is independent from the audio part.

nanosonde avatar Oct 12 '21 15:10 nanosonde

Maybe someone will be useful. I use this option to implement two-way audio with SIP intercom https://github.com/spbroot/sipdoorbell (Homebridge-camera-ffmpeg + Baresip + ALSA loopback).

Which SIP intercom are you using?

Hi. I am using an analog intercom and a SIP converter is connected to it.

spbroot avatar Oct 14 '21 22:10 spbroot

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Ok, I will do it.

spbroot avatar Oct 14 '21 22:10 spbroot

@spbroot Nice, if you share how you did your setup somewhere, I can add the instructions to the project site.

Couldn't the HTTP calls to answer and hangup be executed as part of the pre- and post-hooks when running the FFMPEG process?

So I think it is enough if the camera-ffmpeg plugin just calls a webhook before and after executing FFMPEG without waiting for a reply.

The audio loopback device can always be opened by FFMPEG. The video stream is always there for most SIP intercoms where the video is independent from the audio part.

Hi. I would also like to add SIP call control only through the Homekit functionality, but for this I need to come up with an interaction with the plugin. At first I had the idea to track the creation of an FFMPEG process that is launched in the system with the parameters of my device, but this is not an option for me, because I have several Apple TVs in my house that notify about the doorbell, and they request video, thereby initiating the start of the FFMPEG process when the doorbell rings. I think the best option is to establish a SIP connection when you press the TALK button, and disconnect it when you press it again. So far, it has only been possible to implement this through parsing Homebridge logs when "Two-way FFmpeg Debug Logging" is enabled, but this is a bad solution. (I have updated the information with a new script that does this).

I think if the functionality of executing external scripts will be added in the future plugin, it would be nice to add the execution of external commands when the TALK button is pressed and disabled (if possible). It would also be nice to access the plugin via HTTP indicating the device (something like http: // homebridge: 8080 / status? Doorbell) and receive a response with the status: is the TALK button pressed, etc. and everyone will be able to parse the parameters and state they need.

spbroot avatar Oct 14 '21 22:10 spbroot

@Sunoo can we have a version (pull request still open) that supports SIP calls ?, or how do I install the fork that contains it? thnks a lot!

mrMiimo avatar Oct 29 '22 23:10 mrMiimo

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

Sunoo avatar Oct 29 '22 23:10 Sunoo

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

Is there a SIP doorbell you'd be interested in installing? Maybe we can chip in one for you.

longzheng avatar Oct 30 '22 04:10 longzheng

I’d be open to installing one, not sure what’s even available as far as those go to be honest.

Sunoo avatar Oct 30 '22 04:10 Sunoo

I’ll try to get that merged soon, I can’t truly test it myself, but I suppose it must work.

is there a way I can test it? maybe if you can create a branch ...

mrMiimo avatar Oct 30 '22 04:10 mrMiimo

Willing to test this! :) I do have a doorbird that can initiate sip calls upon ringing. Their API documentation has the SIP stuff starting at page 33 :)

stephanlinke87 avatar Nov 11 '22 10:11 stephanlinke87

This one would be really nice. I'm using a 2N IP Verso 2. That is capable of initiating SIP calls as well.

edelmaca avatar Jul 11 '23 05:07 edelmaca