frigate
frigate copied to clipboard
YamNet Audio Classification
Purpose
Many of our cameras have built-in microphones. A lot of the time interesting things can be heard off camera that would be good to classify as events. This draft PR is a proof of concept showing how audio could be captured in Frigate. I have currently stopped at detection and logging events to the console. Actually tracking and storing audio events would require a relatively large re-work of object tracking, events and recordings.
Implements a partial solution for https://github.com/blakeblackshear/frigate/issues/1869 and https://github.com/blakeblackshear/frigate/issues/4622
TODO
- [x] Add YamNet TFLite and EdgeTPU models to the Docker Container
- [x] Update the config to accept parameters for audio classification
- [x] Have FFMPEG output an audio stream if audio classification is enabled without creating a separate process
- [x] Expand the detector API to support audio classification
- [x] Implement YamNet for CPU and EdgeTPU detectors
- [x] Implement basic label and threshold filters for audio detections
- [ ] Save audio detections as events
- [ ] Keep track of active audio detections (ObjectTracker?)
- [ ] Take an initial thumbnail from the video feed when an audio event starts
- [ ] Make sure the recording maintainer keeps recordings for audio events (no need for snapshots really)
- [ ] If an audio event occurs at the same time as a video event, should it just be a sub-label?
- [ ] Monitoring
- [ ] Track detections/second for audio processing & skipped segments
- [ ] Update UI to track audio processes
Example Config
mqtt:
host: mqtt
ffmpeg:
output_args:
record: preset-record-generic-audio
detectors:
cpu1:
type: cpu
cpu2:
type: cpu
model:
type: audio
detect:
width: 1920
height: 1080
fps: 5
record:
enabled: true
cameras:
example:
ffmpeg:
inputs:
- path: rtsp://example.com:554
roles:
- detect
- detect_audio
- record
- restream
Current Output
[2023-01-07 01:32:17] frigate.app INFO : Starting Frigate (0.12.0-ec7aaa1)
[2023-01-07 01:32:17] peewee_migrate INFO : Starting migrations
[2023-01-07 01:32:17] peewee_migrate INFO : There is nothing to migrate
[2023-01-07 01:32:17] ws4py INFO : Using epoll
[2023-01-07 01:32:17] frigate.app INFO : Output process started: 8557
[2023-01-07 01:32:17] frigate.app INFO : Audio capture started for office: 8570
[2023-01-07 01:32:17] frigate.app INFO : Audio processor started for office: 8572
[2023-01-07 01:32:17] frigate.app INFO : Camera processor started for office: 8575
[2023-01-07 01:32:17] frigate.app INFO : Capture process started for office: 8578
[2023-01-07 01:32:18] detector.cpu2 INFO : Starting detection process: 8555
[2023-01-07 01:32:18] frigate.detectors WARNING : CPU detectors are not recommended and should only be used for testing or for trial purposes.
[2023-01-07 01:32:18] detector.cpu1 INFO : Starting detection process: 8554
[2023-01-07 01:32:18] frigate.detectors WARNING : CPU detectors are not recommended and should only be used for testing or for trial purposes.
[2023-01-07 01:32:19] ws4py INFO : Using epoll
[2023-01-07 01:32:19] frigate.output WARNING : Unable to read frigate logo
[2023-01-07 01:32:26] ws4py INFO : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:34166]
[2023-01-07 01:32:34] frigate.audio INFO : Speech: 0.95703125
[2023-01-07 01:32:35] frigate.audio INFO : Speech: 0.91796875
[2023-01-07 01:32:36] frigate.audio INFO : Speech: 0.80078125
[2023-01-07 01:32:46] frigate.audio INFO : Speech: 0.984375
[2023-01-07 01:32:47] frigate.audio INFO : Speech: 0.98046875
[2023-01-07 01:32:55] frigate.audio INFO : Speech: 0.96875
[2023-01-07 01:33:01] frigate.audio INFO : Speech: 0.66796875
[2023-01-07 01:33:02] frigate.audio INFO : Speech: 0.73828125
[2023-01-07 01:33:03] frigate.audio INFO : Speech: 0.73828125
[2023-01-07 01:33:04] frigate.audio INFO : Speech: 0.98046875
[2023-01-07 01:33:26] ws4py INFO : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:34166]
[2023-01-07 01:33:27] frigate.audio INFO : Speech: 0.8515625
[2023-01-07 01:33:28] frigate.audio INFO : Speech: 0.66796875
[2023-01-07 01:33:29] frigate.audio INFO : Speech: 0.91796875
[2023-01-07 01:33:30] frigate.audio INFO : Speech: 0.8515625
Deploy Preview for frigate-docs canceled.
Name | Link |
---|---|
Latest commit | 73118fd620ccbae9c353b7364a7652cac9eac57d |
Latest deploy log | https://app.netlify.com/sites/frigate-docs/deploys/63b913a2f5f2b20008f3d99a |
We should hold off on the remaining work until we implement the ability to trigger events externally. That will involve a refactor of events to be more generalized and should make this and any other future event types easier to add.
We should hold off on the remaining work until we implement the ability to trigger events externally. That will involve a refactor of events to be more generalized and should make this and any other future event types easier to add.
For sure. I pushed this draft up because it was already getting pretty large. Happy to have some conversations around what a refactor could/should look like to make things more generic for not only new features like audio, but also external systems. One other thing I was thinking about was transcriptions with OpenAI Whisper and having the subtitles show in the player and also be searchable for example. That is an entirely different ball game, and is probably something that should be external to Frigate's core.
@hunterjm Just wanted to let you know, with the changes done in https://github.com/blakeblackshear/frigate/pull/6194 and https://github.com/blakeblackshear/frigate/pull/6320 this should be easier to implement now.
Why is closed?
Because it's been implemented in a separate PR using this PR as a base.
At this point the feature is out https://docs.frigate.video/configuration/audio_detectors