dahua
dahua copied to clipboard
playing tts/audio on VTO
It would be awesome, to be able to send tts or audio via VTO speaker.
My personal use case is to connect face recognition with voice messages. Something like "Hello MyName"
If there is no direct command for that, my VTO have a place where I can store mp3 audio for various events. Maybe rroller/dahua could generate mp3, upload it to VTO, and trigger an action for that ?
you can change the orginal voice with your own mp3 but there is a limit with 20kb only :/
I may have found the api command for the Amcrest AD110 doorbell, in theory it would be the same for the Dahua ones. Doing some tests and will report back.
UPDATE: Ok, so apparently "we" have already known about the endpoint for sometime. From what I have found is it is really sketch for files, it needs to be rather short and lower quality, else the device gets overwhelmed. I plan to work on some premade tts recordings and see where it leads.
MORE: I found this
So i took a google tts file I made in HA and converted it like they showed in the thread:
sox -v 0.8 audio_test.mp3 -r 8k -c 1 audio_test.al
Then I sent:
sleep 45 && curl -vvv \ --limit-rate 8K \ -F "file=@audio_test.al;type=Audio/G.711A" \ -H "Content-Type: Audio/G.711A" \ http://admin:password@<ip>/cgi-bin/audio.cgi\?action\=postAudio\&httptype\=singlepart\&channel\=1
set a timer on my phone and ran my fat arse upstairs and waited. I heard the TTS on my doorbell within 1.5s of the timer expiring. There was a little garbage at the beginning and end but the voice came over clear.
When I have a chance I will see about making it a media_player entity.
My VTO
curl -vvv --user "admin:pass" --limit-rate 8K -F "file=@audio_test.al;type=Audio/G.711A" -H "Content-Type: Audio/G.711A" "http://192.168.1.30/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1"
* Trying 192.168.1.30:80...
* Connected to 192.168.124.30 (192.168.124.30) port 80 (#0)
* Server auth using Basic with user 'admin'
> POST /cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1 HTTP/1.1
> Host: 192.168.124.30
> Authorization: Basic XXXXX
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Length: 11138
> Content-Type: Audio/G.711A; boundary=------------------------a90a8721f68274a4
>
* We are completely uploaded and fine
....and hang
But it actually plays nicely on my VTO!!
Just not response ending session
With VTO2211G I do not need --limit-rate nor auth ?!? To get connection close I added --speed-limit 1 --speed-time 1 that close connection where transfer drops below 1byte/sec in 1 sec window.
Can dahua be visible as HA MediaPlayer class device? or maybe it is wrong idea ? It would be awesome to include automatic audio convertion and play function in https://github.com/rroller/dahua
Yeah I had the hang as well. I've never messed with any form of media streaming in python so I don't know how to handle that with the requests module that we are using here. In fact most of my http get/post experiencesin python were simple endpoints that auto closed. This endpoint appears to be the one the app uses to open the stream, but the docs don't show how it ends. I'll have to dive into the requests module and see how it closes persistent connections.
Maybe this ?
r = requests.get('https://github.com', timeout=(3.05, 5))
https://docs.python-requests.org/en/latest/user/advanced/#timeouts
3.05 - connection timeout 5 - read timeout
😂 😅 Never looked into it before, this is likely the way. When I have a mo to work on I'll submit a new PR. Can you confirm with the dahua device the endpoint is the same as my amcrest bell?
Yes it is Please also consider using FFmpeg instead of sox. Default home-assistant docker image, contains only ffmpeg.
ffmpeg -i audio_test.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 audio_test.al
is working for me. Later on I will test it with acc (should be supported with hardware, and using less space/ be faster)
Got it, have some free time coming up, I'll look into it.
I failed trying to play an ACC format on my VTO. pcm_alaw is a way to go.
I've been playing around with this. The issue I am having, though, is after sending a few streams of audio (which work very well btw with pcm_alaw) it then refuses any more. Its almost like it needs a 'end conversation' to be sent to close the existing connections. I am at a loss tbh.
What I have noticed though. It sends perfectly the first time and then fails the second. I believe the 'mic' needs to be turned off somehow. In the amcrest app, you turn the mic on, speak, then turn it off.
IF I test the first time, then go into the app and toggle the mic it works again. I need to figure out how to 'turn off the mic' after sending. Any ideas?
EDIT: Timeouts/keepalive fixed it. https://github.com/rroller/dahua/issues/181#issuecomment-1148628524
Would love to see this as a media player!
As media player would be a grate feature it probably take some time to implement.. in the meantime did someone figure out how to automate/script this in HA ?
Well. For any camera that supports onvif profile T, you can now 2-way with the cameras with go2rtc. I'm using it with a ad410 perfectly.
Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.
Meanwhile I wrote automation for playing TTS over VTO ..this is a main part
shell_command:
play_tts_on_vto: >-
/bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"
For what it's worth - the techniques described here also work on the Amcrest AD110/AD410 doorbells to send custom sounds, including sirens.
Would love to see this as a media player!
any news? i'm very interested in this
Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.
Meanwhile I wrote automation for playing TTS over VTO ..this is a main part
shell_command: play_tts_on_vto: >- /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"
can you explain the procedure better for a newbie like me? thank you @luzik
So, any progress with that issue?
Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.
Meanwhile I wrote automation for playing TTS over VTO ..this is a main part
shell_command: play_tts_on_vto: >- /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"
Would be nice if you explain that code for us
Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode. Meanwhile I wrote automation for playing TTS over VTO ..this is a main part
shell_command: play_tts_on_vto: >- /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"
Would be nice if you explain that code for us
Launches bash and sets 2 local variables
-
VAR 1: 'name' =
{{states('input_text.person_at_door')}}
(Jinja template for HASS to process)-
input_text.person_at_door
- is out of scope here, but I am assuming that there is an external automation that runs face detection and recognition that setsinput_text.person_at_door
to a name like "George" or possibly "Unknown" for faces that aren't recognized.
-
-
VAR 2: 'x' =
/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url
- VAR 2
x
is a curl command that creates a TTS audio file using text:Hi, $name
. It queries HASS TTS endpoint to create a sound file, the variable 'x' is then set to the URL output that is parsed byjq
binary. This gives a URL that you can HTTP GET to obtain the TTS audio file (in .mp3 format, I assume).
- VAR 2
-
&& /usr/bin/curl $x -o /tmp/audio_vto.mp3
- if the 'name' and 'x' vars are set correctly (&& will not execute if the previous command fails) it then runs curl and saves the HASS generated TTS file to a temporary .mp3 file at/tmp/audio_vto.mp3
-
&& /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al
- converts the .mp3 to pcm_alaw with proper flags and saves it to/tmp/audio_vto.al
-
&& /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3
- Issues the final command to send the pcm_alaw file to the VTO device for playback, and then deletes the 2 temp audio files (mp3 and alaw).-
Change
VTO_IP
to the actual IP of your VTO device.
-
Change
The original command has 2 spaces in the last commands -H \"Content-Type: Au dio/G.711A\"
Here is a reformatted command with the whitespace removed:
shell_command:
play_tts_on_vto: >-
/bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"