dahua icon indicating copy to clipboard operation
dahua copied to clipboard

playing tts/audio on VTO

Open luzik opened this issue 2 years ago • 22 comments

It would be awesome, to be able to send tts or audio via VTO speaker.

My personal use case is to connect face recognition with voice messages. Something like "Hello MyName"

If there is no direct command for that, my VTO have a place where I can store mp3 audio for various events. Maybe rroller/dahua could generate mp3, upload it to VTO, and trigger an action for that ?

luzik avatar Mar 16 '22 17:03 luzik

you can change the orginal voice with your own mp3 but there is a limit with 20kb only :/

Saiyajin53 avatar Mar 21 '22 07:03 Saiyajin53

I may have found the api command for the Amcrest AD110 doorbell, in theory it would be the same for the Dahua ones. Doing some tests and will report back.

UPDATE: Ok, so apparently "we" have already known about the endpoint for sometime. From what I have found is it is really sketch for files, it needs to be rather short and lower quality, else the device gets overwhelmed. I plan to work on some premade tts recordings and see where it leads. MORE: I found this So i took a google tts file I made in HA and converted it like they showed in the thread: sox -v 0.8 audio_test.mp3 -r 8k -c 1 audio_test.al Then I sent: sleep 45 && curl -vvv \ --limit-rate 8K \ -F "file=@audio_test.al;type=Audio/G.711A" \ -H "Content-Type: Audio/G.711A" \ http://admin:password@<ip>/cgi-bin/audio.cgi\?action\=postAudio\&httptype\=singlepart\&channel\=1 set a timer on my phone and ran my fat arse upstairs and waited. I heard the TTS on my doorbell within 1.5s of the timer expiring. There was a little garbage at the beginning and end but the voice came over clear. When I have a chance I will see about making it a media_player entity.

itkfilelor avatar Mar 22 '22 02:03 itkfilelor

My VTO

curl -vvv --user "admin:pass" --limit-rate 8K -F "file=@audio_test.al;type=Audio/G.711A" -H "Content-Type: Audio/G.711A" "http://192.168.1.30/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1"
*   Trying 192.168.1.30:80...
* Connected to 192.168.124.30 (192.168.124.30) port 80 (#0)
* Server auth using Basic with user 'admin'
> POST /cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1 HTTP/1.1
> Host: 192.168.124.30
> Authorization: Basic XXXXX
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Length: 11138
> Content-Type: Audio/G.711A; boundary=------------------------a90a8721f68274a4
>
* We are completely uploaded and fine

....and hang

luzik avatar Mar 24 '22 07:03 luzik

But it actually plays nicely on my VTO!!

Just not response ending session

luzik avatar Mar 24 '22 07:03 luzik

With VTO2211G I do not need --limit-rate nor auth ?!? To get connection close I added --speed-limit 1 --speed-time 1 that close connection where transfer drops below 1byte/sec in 1 sec window.

Can dahua be visible as HA MediaPlayer class device? or maybe it is wrong idea ? It would be awesome to include automatic audio convertion and play function in https://github.com/rroller/dahua

luzik avatar Mar 24 '22 08:03 luzik

Yeah I had the hang as well. I've never messed with any form of media streaming in python so I don't know how to handle that with the requests module that we are using here. In fact most of my http get/post experiencesin python were simple endpoints that auto closed. This endpoint appears to be the one the app uses to open the stream, but the docs don't show how it ends. I'll have to dive into the requests module and see how it closes persistent connections.

itkfilelor avatar Mar 24 '22 13:03 itkfilelor

Maybe this ?

r = requests.get('https://github.com', timeout=(3.05, 5))

https://docs.python-requests.org/en/latest/user/advanced/#timeouts

3.05 - connection timeout 5 - read timeout

luzik avatar Mar 24 '22 14:03 luzik

😂 😅 Never looked into it before, this is likely the way. When I have a mo to work on I'll submit a new PR. Can you confirm with the dahua device the endpoint is the same as my amcrest bell?

itkfilelor avatar Mar 24 '22 14:03 itkfilelor

Yes it is Please also consider using FFmpeg instead of sox. Default home-assistant docker image, contains only ffmpeg.

ffmpeg -i audio_test.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 audio_test.al is working for me. Later on I will test it with acc (should be supported with hardware, and using less space/ be faster)

luzik avatar Mar 24 '22 18:03 luzik

Got it, have some free time coming up, I'll look into it.

itkfilelor avatar Mar 24 '22 19:03 itkfilelor

I failed trying to play an ACC format on my VTO. pcm_alaw is a way to go.

luzik avatar Mar 25 '22 08:03 luzik

I've been playing around with this. The issue I am having, though, is after sending a few streams of audio (which work very well btw with pcm_alaw) it then refuses any more. Its almost like it needs a 'end conversation' to be sent to close the existing connections. I am at a loss tbh.

What I have noticed though. It sends perfectly the first time and then fails the second. I believe the 'mic' needs to be turned off somehow. In the amcrest app, you turn the mic on, speak, then turn it off.

IF I test the first time, then go into the app and toggle the mic it works again. I need to figure out how to 'turn off the mic' after sending. Any ideas?

EDIT: Timeouts/keepalive fixed it. https://github.com/rroller/dahua/issues/181#issuecomment-1148628524

calisro avatar Jun 06 '22 12:06 calisro

Would love to see this as a media player!

NickM-27 avatar Jun 19 '22 12:06 NickM-27

As media player would be a grate feature it probably take some time to implement.. in the meantime did someone figure out how to automate/script this in HA ?

luzik avatar Sep 07 '22 16:09 luzik

Well. For any camera that supports onvif profile T, you can now 2-way with the cameras with go2rtc. I'm using it with a ad410 perfectly.

calisro avatar Sep 07 '22 21:09 calisro

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

luzik avatar Sep 08 '22 16:09 luzik

For what it's worth - the techniques described here also work on the Amcrest AD110/AD410 doorbells to send custom sounds, including sirens.

GaryOkie avatar Nov 17 '22 17:11 GaryOkie

Would love to see this as a media player!

any news? i'm very interested in this

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

can you explain the procedure better for a newbie like me? thank you @luzik

morpheus8888 avatar Dec 24 '22 09:12 morpheus8888

So, any progress with that issue?

Pveska avatar Jun 04 '23 12:06 Pveska

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

Pveska avatar Jun 04 '23 12:06 Pveska

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode. Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

Launches bash and sets 2 local variables

  • VAR 1: 'name' = {{states('input_text.person_at_door')}} (Jinja template for HASS to process)

    • input_text.person_at_door - is out of scope here, but I am assuming that there is an external automation that runs face detection and recognition that sets input_text.person_at_door to a name like "George" or possibly "Unknown" for faces that aren't recognized.
  • VAR 2: 'x' = /usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url

    • VAR 2 x is a curl command that creates a TTS audio file using text: Hi, $name. It queries HASS TTS endpoint to create a sound file, the variable 'x' is then set to the URL output that is parsed by jq binary. This gives a URL that you can HTTP GET to obtain the TTS audio file (in .mp3 format, I assume).
  • && /usr/bin/curl $x -o /tmp/audio_vto.mp3 - if the 'name' and 'x' vars are set correctly (&& will not execute if the previous command fails) it then runs curl and saves the HASS generated TTS file to a temporary .mp3 file at /tmp/audio_vto.mp3

  • && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al - converts the .mp3 to pcm_alaw with proper flags and saves it to /tmp/audio_vto.al

  • && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3 - Issues the final command to send the pcm_alaw file to the VTO device for playback, and then deletes the 2 temp audio files (mp3 and alaw).

    • Change VTO_IP to the actual IP of your VTO device.

The original command has 2 spaces in the last commands -H \"Content-Type: Au dio/G.711A\"

Here is a reformatted command with the whitespace removed:

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16  /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

baudneo avatar Dec 25 '23 23:12 baudneo