frigate icon indicating copy to clipboard operation
frigate copied to clipboard

YamNet Audio Classification

Open hunterjm opened this issue 1 year ago • 3 comments

Purpose

Many of our cameras have built-in microphones. A lot of the time interesting things can be heard off camera that would be good to classify as events. This draft PR is a proof of concept showing how audio could be captured in Frigate. I have currently stopped at detection and logging events to the console. Actually tracking and storing audio events would require a relatively large re-work of object tracking, events and recordings.

Implements a partial solution for https://github.com/blakeblackshear/frigate/issues/1869 and https://github.com/blakeblackshear/frigate/issues/4622

TODO

  • [x] Add YamNet TFLite and EdgeTPU models to the Docker Container
  • [x] Update the config to accept parameters for audio classification
  • [x] Have FFMPEG output an audio stream if audio classification is enabled without creating a separate process
  • [x] Expand the detector API to support audio classification
  • [x] Implement YamNet for CPU and EdgeTPU detectors
  • [x] Implement basic label and threshold filters for audio detections
  • [ ] Save audio detections as events
    • [ ] Keep track of active audio detections (ObjectTracker?)
    • [ ] Take an initial thumbnail from the video feed when an audio event starts
    • [ ] Make sure the recording maintainer keeps recordings for audio events (no need for snapshots really)
    • [ ] If an audio event occurs at the same time as a video event, should it just be a sub-label?
  • [ ] Monitoring
    • [ ] Track detections/second for audio processing & skipped segments
    • [ ] Update UI to track audio processes

Example Config

mqtt:
  host: mqtt

ffmpeg:
  output_args:
    record: preset-record-generic-audio

detectors:
  cpu1:
    type: cpu
  cpu2:
    type: cpu
    model:
      type: audio

detect:
  width: 1920
  height: 1080
  fps: 5

record:
  enabled: true

cameras:
  example:
    ffmpeg:
      inputs:
        - path: rtsp://example.com:554
          roles:
            - detect
            - detect_audio
            - record
            - restream

Current Output

[2023-01-07 01:32:17] frigate.app                    INFO    : Starting Frigate (0.12.0-ec7aaa1)
[2023-01-07 01:32:17] peewee_migrate                 INFO    : Starting migrations
[2023-01-07 01:32:17] peewee_migrate                 INFO    : There is nothing to migrate
[2023-01-07 01:32:17] ws4py                          INFO    : Using epoll
[2023-01-07 01:32:17] frigate.app                    INFO    : Output process started: 8557
[2023-01-07 01:32:17] frigate.app                    INFO    : Audio capture started for office: 8570
[2023-01-07 01:32:17] frigate.app                    INFO    : Audio processor started for office: 8572
[2023-01-07 01:32:17] frigate.app                    INFO    : Camera processor started for office: 8575
[2023-01-07 01:32:17] frigate.app                    INFO    : Capture process started for office: 8578
[2023-01-07 01:32:18] detector.cpu2                  INFO    : Starting detection process: 8555
[2023-01-07 01:32:18] frigate.detectors              WARNING : CPU detectors are not recommended and should only be used for testing or for trial purposes.
[2023-01-07 01:32:18] detector.cpu1                  INFO    : Starting detection process: 8554
[2023-01-07 01:32:18] frigate.detectors              WARNING : CPU detectors are not recommended and should only be used for testing or for trial purposes.
[2023-01-07 01:32:19] ws4py                          INFO    : Using epoll
[2023-01-07 01:32:19] frigate.output                 WARNING : Unable to read frigate logo
[2023-01-07 01:32:26] ws4py                          INFO    : Managing websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:34166]
[2023-01-07 01:32:34] frigate.audio                  INFO    : Speech: 0.95703125
[2023-01-07 01:32:35] frigate.audio                  INFO    : Speech: 0.91796875
[2023-01-07 01:32:36] frigate.audio                  INFO    : Speech: 0.80078125
[2023-01-07 01:32:46] frigate.audio                  INFO    : Speech: 0.984375
[2023-01-07 01:32:47] frigate.audio                  INFO    : Speech: 0.98046875
[2023-01-07 01:32:55] frigate.audio                  INFO    : Speech: 0.96875
[2023-01-07 01:33:01] frigate.audio                  INFO    : Speech: 0.66796875
[2023-01-07 01:33:02] frigate.audio                  INFO    : Speech: 0.73828125
[2023-01-07 01:33:03] frigate.audio                  INFO    : Speech: 0.73828125
[2023-01-07 01:33:04] frigate.audio                  INFO    : Speech: 0.98046875
[2023-01-07 01:33:26] ws4py                          INFO    : Terminating websocket [Local => 127.0.0.1:5002 | Remote => 127.0.0.1:34166]
[2023-01-07 01:33:27] frigate.audio                  INFO    : Speech: 0.8515625
[2023-01-07 01:33:28] frigate.audio                  INFO    : Speech: 0.66796875
[2023-01-07 01:33:29] frigate.audio                  INFO    : Speech: 0.91796875
[2023-01-07 01:33:30] frigate.audio                  INFO    : Speech: 0.8515625

hunterjm avatar Jan 07 '23 06:01 hunterjm

Deploy Preview for frigate-docs canceled.

Name Link
Latest commit 73118fd620ccbae9c353b7364a7652cac9eac57d
Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/63b913a2f5f2b20008f3d99a

netlify[bot] avatar Jan 07 '23 06:01 netlify[bot]

We should hold off on the remaining work until we implement the ability to trigger events externally. That will involve a refactor of events to be more generalized and should make this and any other future event types easier to add.

blakeblackshear avatar Jan 07 '23 11:01 blakeblackshear

We should hold off on the remaining work until we implement the ability to trigger events externally. That will involve a refactor of events to be more generalized and should make this and any other future event types easier to add.

For sure. I pushed this draft up because it was already getting pretty large. Happy to have some conversations around what a refactor could/should look like to make things more generic for not only new features like audio, but also external systems. One other thing I was thinking about was transcriptions with OpenAI Whisper and having the subtitles show in the player and also be searchable for example. That is an entirely different ball game, and is probably something that should be external to Frigate's core.

hunterjm avatar Jan 08 '23 01:01 hunterjm

@hunterjm Just wanted to let you know, with the changes done in https://github.com/blakeblackshear/frigate/pull/6194 and https://github.com/blakeblackshear/frigate/pull/6320 this should be easier to implement now.

NickM-27 avatar May 01 '23 22:05 NickM-27

Why is closed?

EuPhobos avatar Feb 29 '24 06:02 EuPhobos

Because it's been implemented in a separate PR using this PR as a base.

At this point the feature is out https://docs.frigate.video/configuration/audio_detectors

NickM-27 avatar Feb 29 '24 12:02 NickM-27