screenpipe icon indicating copy to clipboard operation
screenpipe copied to clipboard

[bounty] impl STT streaming

Open louis030195 opened this issue 1 year ago β€’ 17 comments

previous context: #431 #306 #374

/bounty 200

@EzraEllette

ideally this solves:

  • new features: ability to stream audio in integrations like pipes, apps on top of screenpipe, voice agents, live transcriptions etc.
  • windows audio issues for some users (not sure)
  • transcription higher quality by not losing small chunks of audio between recordings
  • macos Display audio with deepgram does not seem to work for me (deepgram has a streaming feature too)

louis030195 avatar Oct 17 '24 16:10 louis030195

πŸ’Ž $200 bounty β€’ Screenpi.pe

~~## πŸ’Ž $100 bounty β€’ Screenpi.pe~~

Steps to solve:

  1. Start working: Comment /attempt #521 with your implementation plan
  2. Submit work: Create a pull request including /claim #521 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bounty β€’ Share on socials

Attempt Started (GMT+0) Solution
🟒 @EzraEllette Oct 17, 2024, 6:05:47 PM #521

algora-pbc[bot] avatar Oct 17 '24 16:10 algora-pbc[bot]

/attempt #521

πŸ‘πŸΌ I'm going to spend another hour on the screenpipe shortcut, then come to this. I spent some time planning this out yesterday, so we'll see how far it gets today.

Algora profile Completed bounties Tech Active attempts Options
@EzraEllette 5 mediar-ai bounties
Rust, TypeScript,
JavaScript & more
﹟451, ﹟513
Cancel attempt

EzraEllette avatar Oct 17 '24 18:10 EzraEllette

Screenshot 2024-10-17 at 16 26 12

weird stuff happening on my side

bunch of speech frames on audio output (nothing played)

louis030195 avatar Oct 17 '24 23:10 louis030195

Can you send the audio clip from this screenshot?

EzraEllette avatar Oct 17 '24 23:10 EzraEllette

i did not play any audio, currently facing 3 issues:

  • transcriptions does not work at all (for me and another user)
  • file encoding stopped working (for me and another user) -> not sure about this one
  • i can't run screenpipe without dev mode in the app ("Read-only file system" seems to be only on my end) Screenshot 2024-10-17 at 16 39 42

update: matt also has the read only issue not sure what's happening

louis030195 avatar Oct 17 '24 23:10 louis030195

on main:

(env) (base) louisbeaumont@louisbeaumontme-macbook:~/Documents/screen-pipe$ ./target/release/screenpipe
2024-10-18T00:07:47.117161Z  INFO screenpipe: logging initialized
2024-10-18T00:07:47.506784Z  INFO screenpipe:   MacBook Pro Microphone (input)
2024-10-18T00:07:47.506843Z  INFO screenpipe:   Display 1 (output)
2024-10-18T00:07:47.509740Z  INFO screenpipe_server::db: Migrations executed successfully.    
2024-10-18T00:07:47.509752Z  INFO screenpipe: database initialized, will store files in /Users/louisbeaumont/.screenpipe



                                            _          
   __________________  ___  ____     ____  (_____  ___ 
  / ___/ ___/ ___/ _ \/ _ \/ __ \   / __ \/ / __ \/ _ \
 (__  / /__/ /  /  __/  __/ / / /  / /_/ / / /_/ /  __/
/____/\___/_/   \___/\___/_/ /_/  / .___/_/ .___/\___/ 
                                 /_/     /_/           



build ai apps that have the full context
open source | runs locally | developer friendly


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ setting             β”‚ value                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ fps                 β”‚ 0.2                                β”‚
β”‚ audio chunk durationβ”‚ 30 seconds                         β”‚
β”‚ video chunk durationβ”‚ 60 seconds                         β”‚
β”‚ port                β”‚ 3030                               β”‚
β”‚ audio disabled      β”‚ false                              β”‚
β”‚ vision disabled     β”‚ false                              β”‚
β”‚ save text files     β”‚ false                              β”‚
β”‚ audio engine        β”‚ WhisperLargeV3Turbo                β”‚
β”‚ ocr engine          β”‚ AppleNative                        β”‚
β”‚ vad engine          β”‚ Silero                             β”‚
β”‚ vad sensitivity     β”‚ High                               β”‚
β”‚ data directory      β”‚ /Users/louisbeaumont/.screenpipe   β”‚
β”‚ debug mode          β”‚ false                              β”‚
β”‚ telemetry           β”‚ true                               β”‚
β”‚ local llm           β”‚ false                              β”‚
β”‚ use pii removal     β”‚ false                              β”‚
β”‚ ignored windows     β”‚ []                                 β”‚
β”‚ included windows    β”‚ []                                 β”‚
β”‚ friend wearable uid β”‚ not set                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ languages           β”‚                                    β”‚
β”‚                     β”‚ all languages                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ monitors            β”‚                                    β”‚
β”‚                     β”‚ id: 1                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ audio devices       β”‚                                    β”‚
β”‚                     β”‚ MacBook Pro Microphone (input)     β”‚
β”‚                     β”‚ Display 1 (output)                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ pipes               β”‚                                    β”‚
β”‚                     β”‚ (disabled) pipe-llama32-comment... β”‚
β”‚                     β”‚ (disabled) pipe-screen-time-sto... β”‚
β”‚                     β”‚ (disabled) pipe-email-exa-search   β”‚
β”‚                     β”‚ (disabled) pipe-phi3-5-engineer... β”‚
β”‚                     β”‚ (disabled) pipe-meeting-summary... β”‚
β”‚                     β”‚ ... and 3 more                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
you are using local processing. all your data stays on your computer.

warning: telemetry is enabled. only error-level data will be sent to highlight.io.
to disable, use the --disable-telemetry flag.
2024-10-18T00:07:47.513256Z  INFO screenpipe_server::server: Server starting on 127.0.0.1:3030    
2024-10-18T00:07:47.517022Z  INFO screenpipe_audio::whisper: device = Metal(MetalDevice(DeviceId(1)))    
2024-10-18T00:07:51.931752Z  INFO screenpipe_audio::vad_engine: Initializing SileroVad...
2024-10-18T00:07:51.931809Z  INFO screenpipe_audio::vad_engine: SileroVad Model downloaded to: "/Users/louisbeaumont/Library/Caches/screenpipe/vad/silero_vad.onnx"
2024-10-18T00:07:51.952655Z  INFO screenpipe_server::video: Starting new video capture    
2024-10-18T00:07:51.952686Z  INFO screenpipe_server::video: Started capture thread    
2024-10-18T00:07:53.048206Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-07-53.mp4    
2024-10-18T00:07:57.684565Z  INFO screenpipe_server::resource_monitor: Runtime: 10s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 0%, NPU: N/A
2024-10-18T00:08:02.548318Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:02.548319Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:02.591900Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:02.659273Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:08:07.797626Z  INFO screenpipe_server::resource_monitor: Runtime: 20s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 23%, NPU: N/A
2024-10-18T00:08:17.894315Z  INFO screenpipe_server::resource_monitor: Runtime: 30s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 26%, NPU: N/A
2024-10-18T00:08:27.968788Z  INFO screenpipe_server::resource_monitor: Runtime: 40s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 16%, NPU: N/A
2024-10-18T00:08:30.550787Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:30.550788Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:30.587664Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:30.676379Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:08:32.683475Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 1)    
2024-10-18T00:08:32.683503Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 1): ()    
2024-10-18T00:08:32.683509Z  INFO screenpipe_server::core: Finished iteration 1 for device Display 1 (output)    
2024-10-18T00:08:32.687905Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:08:32.720367Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 1)    
2024-10-18T00:08:32.720382Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 1): ()    
2024-10-18T00:08:32.720385Z  INFO screenpipe_server::core: Finished iteration 1 for device MacBook Pro Microphone (input)    
2024-10-18T00:08:32.819238Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 598, frames that include speech: 409, speech duration: 40900ms, speech ratio: 0.68, min required ratio: 0.20    
2024-10-18T00:08:33.501830Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:08:35.925761Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-08-35.mp4    
2024-10-18T00:08:38.025084Z  INFO screenpipe_server::resource_monitor: Runtime: 50s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 42%, NPU: N/A
2024-10-18T00:08:41.759461Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:08:41.759482Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:08:41.759512Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:08:48.084594Z  INFO screenpipe_server::resource_monitor: Runtime: 60s, Total Memory: 1% (1 GB / 37 GB), Total CPU: 41%, NPU: N/A
2024-10-18T00:08:49.927016Z  INFO screenpipe_audio::whisper: 30.0s -- 60.0s    
2024-10-18T00:08:49.927035Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:08:49.927064Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:08:50.282551Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:08:50.340787Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 35, speech duration: 3500ms, speech ratio: 0.12, min required ratio: 0.20    
2024-10-18T00:08:50.355553Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a\n")    
2024-10-18T00:08:50.355618Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-08-49.mp4"    
2024-10-18T00:08:50.357890Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some("")    
2024-10-18T00:08:50.357907Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: ""    
2024-10-18T00:08:58.157661Z  INFO screenpipe_server::resource_monitor: Runtime: 70s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 26%, NPU: N/A
2024-10-18T00:08:58.551730Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:58.551744Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:58.584655Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:58.669722Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:00.637851Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 2)    
2024-10-18T00:09:00.637873Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 2): ()    
2024-10-18T00:09:00.637877Z  INFO screenpipe_server::core: Finished iteration 2 for device Display 1 (output)    
2024-10-18T00:09:00.640354Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:00.683519Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 2)    
2024-10-18T00:09:00.683537Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 2): ()    
2024-10-18T00:09:00.683540Z  INFO screenpipe_server::core: Finished iteration 2 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:00.765674Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 598, frames that include speech: 170, speech duration: 17000ms, speech ratio: 0.28, min required ratio: 0.20    
2024-10-18T00:09:01.435215Z  INFO screenpipe_audio::multilingual: detected language: "ru"    
2024-10-18T00:09:03.698518Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:03.698542Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:03.698554Z  INFO screenpipe_audio::whisper:   0.0s-17.0s:  Π½Π΅ Π·Π°Π±ΡƒΠ΄ΡŒΡ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅.    
2024-10-18T00:09:04.474279Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:04.474306Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:04.474312Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ΅Π½ΠΈΠ΅ слСдуСт...    
2024-10-18T00:09:04.762738Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:04.782686Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" Π½Π΅ Π·Π°Π±ΡƒΠ΄ΡŒΡ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅.\n")    
2024-10-18T00:09:04.782742Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-09-04.mp4"    
2024-10-18T00:09:04.824993Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 191, speech duration: 19100ms, speech ratio: 0.64, min required ratio: 0.20    
2024-10-18T00:09:05.481530Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:07.914646Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:07.914669Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:07.914678Z  INFO screenpipe_audio::whisper:   0.0s-7.0s:  long night of racial injustice. I accept this award on behalf of a civil rights movement    
2024-10-18T00:09:07.914684Z  INFO screenpipe_audio::whisper:   7.0s-16.0s:  which is moving with determination and a majestic scorn for risk and danger to establish a reign    
2024-10-18T00:09:07.914688Z  INFO screenpipe_audio::whisper:   16.0s-19.0s:  of freedom and a rule of justice.    
2024-10-18T00:09:08.214373Z  INFO screenpipe_server::resource_monitor: Runtime: 80s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 50%, NPU: N/A
2024-10-18T00:09:08.667949Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:08.667966Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:08.667971Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
2024-10-18T00:09:08.952238Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some(" long night of racial injustice. I accept this award on behalf of a civil rights movement which is moving with determination and a majestic scorn for risk and danger to establish a reign of freedom and a rule of justice.\n")    
2024-10-18T00:09:08.952291Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/MacBook Pro Microphone (input)_2024-10-18_00-09-08.mp4"    
2024-10-18T00:09:18.286262Z  INFO screenpipe_server::resource_monitor: Runtime: 90s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 17%, NPU: N/A
2024-10-18T00:09:20.324354Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-09-20.mp4    
2024-10-18T00:09:26.554054Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:09:26.554068Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:09:26.576186Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:09:26.658205Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:28.375013Z  INFO screenpipe_server::resource_monitor: Runtime: 100s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 32%, NPU: N/A
2024-10-18T00:09:28.644341Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 3)    
2024-10-18T00:09:28.645648Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 3): ()    
2024-10-18T00:09:28.645698Z  INFO screenpipe_server::core: Finished iteration 3 for device Display 1 (output)    
2024-10-18T00:09:28.651222Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:28.730917Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 3)    
2024-10-18T00:09:28.730936Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 3): ()    
2024-10-18T00:09:28.730939Z  INFO screenpipe_server::core: Finished iteration 3 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:28.787250Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 599, frames that include speech: 262, speech duration: 26200ms, speech ratio: 0.44, min required ratio: 0.20    
2024-10-18T00:09:29.452026Z  INFO screenpipe_audio::multilingual: detected language: "fr"    
2024-10-18T00:09:37.575058Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:37.575078Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:37.575108Z  INFO screenpipe_audio::whisper:   0.0s-...:  Je ne suis pas de la mort, mais je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la    
2024-10-18T00:09:38.268747Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:38.268771Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:38.268775Z  INFO screenpipe_audio::whisper:   0.0s-29.0s:  ...    
2024-10-18T00:09:38.434700Z  INFO screenpipe_server::resource_monitor: Runtime: 110s, Total Memory: 2% (1 GB / 37 GB), Total CPU: 37%, NPU: N/A
2024-10-18T00:09:38.565082Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:38.610329Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" Je ne suis pas de la mort, mais je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la\n")    
2024-10-18T00:09:38.610393Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-09-38.mp4"    
2024-10-18T00:09:38.630196Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 191, speech duration: 19100ms, speech ratio: 0.64, min required ratio: 0.20    
2024-10-18T00:09:39.289017Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:41.772842Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:41.772863Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:41.772877Z  INFO screenpipe_audio::whisper:   0.0s-19.2s:  I am mindful that only yesterday in Birmingham, Alabama, our children crying out for brotherhood, answered with fire hoses, snarling dogs and even death. I am mindful that only yesterday in Philadelphia, Mississippi, young people seeking help.    
2024-10-18T00:09:42.525594Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:42.525615Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:42.525620Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
2024-10-18T00:09:42.773644Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some(" I am mindful that only yesterday in Birmingham, Alabama, our children crying out for brotherhood, answered with fire hoses, snarling dogs and even death. I am mindful that only yesterday in Philadelphia, Mississippi, young people seeking help.\n")    
2024-10-18T00:09:42.773716Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/MacBook Pro Microphone (input)_2024-10-18_00-09-42.mp4"    
2024-10-18T00:09:48.513890Z  INFO screenpipe_server::resource_monitor: Runtime: 120s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 37%, NPU: N/A
2024-10-18T00:09:54.555273Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:09:54.555278Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:09:54.598362Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:09:54.685662Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:56.633074Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 4)    
2024-10-18T00:09:56.633099Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 4): ()    
2024-10-18T00:09:56.633103Z  INFO screenpipe_server::core: Finished iteration 4 for device Display 1 (output)    
2024-10-18T00:09:56.636789Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:56.707473Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 4)    
2024-10-18T00:09:56.707491Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 4): ()    
2024-10-18T00:09:56.707494Z  INFO screenpipe_server::core: Finished iteration 4 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:56.763168Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 599, frames that include speech: 202, speech duration: 20200ms, speech ratio: 0.34, min required ratio: 0.20    
2024-10-18T00:09:57.421515Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:58.571087Z  INFO screenpipe_server::resource_monitor: Runtime: 131s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 18%, NPU: N/A
2024-10-18T00:10:03.059536Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-10-03.mp4    
^C2024-10-18T00:10:03.707935Z  INFO screenpipe: received ctrl+c, initiating shutdown
2024-10-18T00:10:03.707967Z  INFO screenpipe: shutdown complete
2024-10-18T00:10:03.708017Z  INFO screenpipe: received shutdown signal for recording
thread 'tokio-runtime-worker' panicked at /Users/louisbeaumont/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/blocking/shutdown.rs:51:21:
Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-10-18T00:10:05.693295Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:10:05.693315Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:10:05.693342Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:10:06.450579Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:10:06.450596Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:10:06.450601Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
(env) (base) louisbeaumont@louisbeaumontme-macbook:~/Documents/screen-pipe$ 

kinda work, mostly bunch of errors at start and end, and Display audio not working at all

https://www.youtube.com/watch?v=5r98tT0j1a0

louis030195 avatar Oct 18 '24 00:10 louis030195

@louis030195 Added repeat penalty no repetition after 15 minutes with a massive dehumidifier running near my mic. Ran a final test: image

EzraEllette avatar Oct 18 '24 04:10 EzraEllette

@EzraEllette

2024-10-18T16:58:20.095975Z [you]  Thank you.

2024-10-18T17:00:06.000161Z [you]  Thank you.

2024-10-18T17:01:19.028590Z [you]  Thank you.

2024-10-18T17:01:46.969358Z [you]  Thank you.

2024-10-18T17:02:15.067304Z [you]  Thank you.

2024-10-18T17:02:42.902966Z [you]  Thank you.

2024-10-18T17:03:10.944181Z [you]  Thank you.

2024-10-18T17:03:38.921454Z [you]  Thank you.

2024-10-18T17:04:06.962985Z [you]  Thank you.

2024-10-18T17:04:35.053024Z [you]  Thank you.

2024-10-18T17:05:02.986029Z [you]  Thank you.

2024-10-18T17:05:31.155720Z [you]  Thank you.

2024-10-18T17:05:59.975981Z [you]  Thank you.

2024-10-18T17:06:27.002774Z [you]  Thank you.

2024-10-18T17:06:54.949682Z [you]  Thank you.

2024-10-18T17:07:22.994386Z [you]  Thank you.

2024-10-18T17:07:50.977568Z [you]  Thank you.

2024-10-18T17:08:18.920124Z [you] Thank you.
2024-10-18T17:08:46.966124Z [you]  Thank you.

2024-10-18T17:09:15.016875Z [you]  Thank you.

2024-10-18T17:09:43.062080Z [you]  Thank you.

2024-10-18T17:10:11.082950Z [you]  Thank you.

2024-10-18T17:10:38.936120Z [you]  Thank you.

2024-10-18T17:11:07.082037Z [you]  Thank you.

2024-10-18T17:11:35.627123Z [you]  Thank you.

2024-10-18T17:12:03.412783Z [you]  Thank you.

2024-10-18T17:12:31.595844Z [you]  Thank you.

2024-10-18T17:12:59.901008Z [you]  Thank you.

2024-10-18T17:13:27.467794Z [you] Thank you.
2024-10-18T17:13:55.410570Z [you]  Thank you.

2024-10-18T17:14:23.506021Z [you]  Thank you.

2024-10-18T17:14:51.247378Z [you]  Thank you.

2024-10-18T17:15:19.211681Z [you]  Thank you.

2024-10-18T17:15:47.252834Z [you]  Thank you.

2024-10-18T17:16:15.088426Z [you]  Thank you.

2024-10-18T17:16:42.931627Z [you] Thank you.
2024-10-18T17:17:11.113553Z [you]  Thank you.

2024-10-18T17:17:39.095140Z [you]  Thank you.

2024-10-18T17:18:07.087969Z [you]  Thank you.

2024-10-18T17:18:35.110100Z [you]  Thank you.

2024-10-18T17:19:03.088271Z [you]  Thank you.

2024-10-18T17:19:31.114233Z [you]  Thank you.

2024-10-18T17:19:59.164065Z [you]  Thank you.

2024-10-18T17:20:27.219669Z [you]  Thank you.

2024-10-18T17:20:55.073695Z [you]  Thank you.

2024-10-18T17:21:23.125236Z [you]  Thank you.

2024-10-18T17:21:51.140878Z [you]  Thank you.

2024-10-18T17:22:18.975286Z [you]  Thank you.

2024-10-18T17:22:47.072188Z [you]  Thank you.

2024-10-18T17:23:15.042848Z [you]  Thank you.

2024-10-18T17:23:43.074946Z [you]  Thank you.

2024-10-18T17:24:11.208082Z [you]  Thank you.

2024-10-18T17:24:39.140286Z [you]  Thank you.

2024-10-18T17:25:07.157022Z [you]  Thank you.

2024-10-18T17:25:35.131140Z [you]  Thank you.

2024-10-18T17:26:03.158740Z [you]  Thank you.

2024-10-18T17:26:31.105051Z [you]  Thank you.

2024-10-18T17:26:59.152049Z [you]  Thank you.

2024-10-18T17:27:27.086300Z [you]  Thank you.

2024-10-18T17:27:55.159696Z [you]  Thank you.

2024-10-18T17:28:23.172966Z [you]  Thank you.

2024-10-18T17:28:51.128113Z [you]  Thank you.

2024-10-18T17:29:19.122335Z [you]  Thank you.

2024-10-18T17:29:47.166743Z [you]  Thank you.

2024-10-18T17:30:15.098752Z [you]  Thank you.

2024-10-18T17:30:43.130111Z [you]  Thank you.

2024-10-18T17:31:11.141765Z [you]  Thank you.

2024-10-18T17:31:39.185230Z [you]  Thank you.

2024-10-18T17:32:07.152871Z [you]  Thank you.

2024-10-18T17:32:35.208828Z [you]  Thank you.

2024-10-18T17:33:03.137629Z [you]  Thank you.

2024-10-18T17:33:31.182693Z [you]  Thank you.

2024-10-18T17:33:59.232615Z [you]  Thank you.

2024-10-18T17:34:27.149406Z [you]  Thank you.

2024-10-18T17:34:55.119002Z [you]  Thank you.

2024-10-18T17:35:23.154382Z [you]  Thank you.

2024-10-18T17:35:51.175037Z [you]  Thank you.

2024-10-18T17:36:19.156118Z [you]  Thank you.

2024-10-18T17:36:47.159276Z [you]  Thank you.

2024-10-18T17:37:15.122607Z [you]  Thank you.

2024-10-18T17:37:43.057067Z [you]  Thank you.

2024-10-18T17:38:11.225979Z [you]  Thank you.

2024-10-18T17:38:39.058831Z [you]  Thank you.

2024-10-18T17:39:07.198580Z [you]  Thank you.

2024-10-18T17:39:35.057496Z [you]  Thank you.

2024-10-18T17:40:03.063776Z [you] Thank you.
2024-10-18T17:40:31.045272Z [you] Thank you.
2024-10-18T17:40:59.104256Z [you] Thank you.
2024-10-18T17:41:27.083779Z [you] Thank you.
2024-10-18T17:41:55.054041Z [you] Thank you.
2024-10-18T17:42:23.109249Z [you] Thank you.
2024-10-18T17:42:51.067050Z [you] Thank you.
2024-10-18T17:43:19.046349Z [you] Thank you.
2024-10-18T17:43:47.104567Z [you] Thank you.
2024-10-18T17:44:15.098849Z [you] Thank you.
2024-10-18T17:44:43.067492Z [you] Thank you.
2024-10-18T17:45:11.022362Z [you] Thank you.
2024-10-18T17:45:39.081078Z [you] Thank you.
2024-10-18T17:46:07.036418Z [you] Thank you.
2024-10-18T17:46:35.119050Z [you] Thank you.
2024-10-18T17:47:03.177475Z [you] Thank you.
2024-10-18T17:47:31.134804Z [you] Thank you.
2024-10-18T17:47:59.087799Z [you] Thank you.
2024-10-18T17:48:27.031133Z [you] Thank you.
2024-10-18T17:48:55.090640Z [you] Thank you.
2024-10-18T17:49:23.147789Z [you] Thank you.
2024-10-18T17:49:51.153614Z [you] Thank you.
2024-10-18T17:50:19.123019Z [you] Thank you.
2024-10-18T17:50:47.103815Z [you] Thank you.
2024-10-18T17:51:15.109341Z [you] Thank you.
2024-10-18T17:51:43.111662Z [you] Thank you.
2024-10-18T17:52:11.148144Z [you] Thank you.
2024-10-18T17:52:39.129620Z [you] Thank you.
2024-10-18T17:53:07.111379Z [you] Thank you.
2024-10-18T17:53:35.079087Z [you] Thank you.
2024-10-18T17:54:03.137069Z [you] Thank you.
2024-10-18T17:54:31.147487Z [you] Thank you.
2024-10-18T17:54:59.110127Z [you] Thank you.
2024-10-18T17:55:27.188364Z [you] Thank you.
2024-10-18T17:55:55.122745Z [you] Thank you.
2024-10-18T17:56:23.460820Z [you] Thank you.
2024-10-18T17:56:51.302627Z [you] Thank you.
2024-10-18T17:57:19.229304Z [you] Thank you.
2024-10-18T17:57:47.193409Z [you] Thank you.
2024-10-18T17:58:15.140614Z [you] Thank you.
2024-10-18T17:58:43.214382Z [you] Thank you.
2024-10-18T17:59:11.238589Z [you] Thank you.
2024-10-18T17:59:39.224145Z [you] Thank you.
2024-10-18T18:00:07.163133Z [you] Thank you.
2024-10-18T18:00:35.198086Z [you] Thank you.
2024-10-18T18:01:03.238787Z [you] Thank you.

this is my transcription state right now, while i had conversation with someone else (IRL)

using mac mic and display output, and whisper turbo

louis030195 avatar Oct 18 '24 18:10 louis030195

just ref interesting stuff https://github.com/CapSoftware/Cap/blob/main/crates/ffmpeg/src/lib.rs

louis030195 avatar Oct 19 '24 18:10 louis030195

note:

image

screenpipe with deepgram used to use only 600-800 mb now it's 4 gb, might be related to audio processing

louis030195 avatar Oct 19 '24 22:10 louis030195

What command did you use to run the cli

EzraEllette avatar Oct 20 '24 01:10 EzraEllette

./target/release/screenpipe --audio-transcription-engine deepgram \
--ocr-engine apple-native --monitor-id 1 --audio-device "MacBook Pro Microphone (input)" \
--audio-device "Display 1 (output)" --ignored-windows "bit" \
--ignored-windows ".env" --ignored-windows "Item-0" \
--ignored-windows "App Icon Window" --ignored-windows "Battery" \
--ignored-windows "Shortcuts" --ignored-windows "WiFi" \
--ignored-windows "BentoBox" --ignored-windows "Clock" \
--ignored-windows "Dock" --ignored-windows "DeepL" \
--deepgram-api-key "abcd" --language english

louis030195 avatar Oct 20 '24 21:10 louis030195

image Transcribe with deepgram wasn't using that much memory for me

EzraEllette avatar Oct 24 '24 18:10 EzraEllette

image Transcribe with deepgram wasn't using that much memory for me

resource monitor is unreliable for memory

louis030195 avatar Oct 25 '24 16:10 louis030195

/tip $100 @EzraEllette

thx for the work on streaming, i'm finishing up things in #578

  • [x] websocket transcription api
  • [x] refactor to use VAD in audio device
  • [x] refactor to make audio pipeline easier to benchmark (accuracy) end-to-end
  • [ ] good benchmark
  • [ ] improve accuracy
  • [ ] test on windows and linux and other macOS
  • [ ] release new app version

louis030195 avatar Oct 25 '24 16:10 louis030195

πŸŽ‰πŸŽˆ @EzraEllette has been awarded $100! 🎈🎊

algora-pbc[bot] avatar Oct 25 '24 16:10 algora-pbc[bot]