screenpipe [bounty] loom pipe with audio

#583

@tribhuwan-kumar

can you add audio?

/bounty 100

Nov 29 '24 20:11 louis030195

MED-357 [bounty] loom pipe with audio

Nov 29 '24 20:11 linear[bot]

💎 $100 bounty • Screenpi.pe

Steps to solve:

Start working: Comment /attempt #804 with your implementation plan
Submit work: Create a pull request including /claim #804 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bounty • Share on socials

Nov 29 '24 20:11 algora-pbc[bot]

also an idea if you want to charge people for this:

make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)

Nov 29 '24 20:11 louis030195

working on it!

Nov 30 '24 15:11 tribhuwan-kumar

Screenshot 2024-12-04 at 8 42 10 AM

fyi we are charging $10/m for the loom pipe and we would give you 50%

so if you can help making it more useful / intuitive, etc. with audio and more!

Dec 04 '24 16:12 louis030195

fyi we are charging $10/m for the loom pipe and we would give you 50%

so its gonna be a paid pipe, i get it!

so if you can help making it more useful / intuitive, etc. with audio and more!

there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.

{190295C0-60B8-40CD-9E32-6084C5DCCDB1}

Dec 05 '24 05:12 tribhuwan-kumar

can you list audio devices through cli and run cli screenpipe with internal audio devices and confirms it's not recording it and share any errors matt

On Wed, Dec 4, 2024 at 9:22 PM tribhuwan @.***> wrote:

fyi we are charging $10/m for the loom pipe and we would give you 50%

so its gonna be a paid pipe, i get it!

so if you can help making it more useful / intuitive, etc. with audio and more!

there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.

190295C0-60B8-40CD-9E32-6084C5DCCDB1.png (view on web) https://github.com/user-attachments/assets/5c99e1bb-bd55-4663-a63c-008be287b63a

— Reply to this email directly, view it on GitHub https://github.com/mediar-ai/screenpipe/issues/804#issuecomment-2519206725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY62CDED7ZWT4KW5HALGKNL2D7PKTAVCNFSM6AAAAABSXWRP3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJZGIYDMNZSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Dec 05 '24 05:12 m13v

@m13v

screenpipe.2024-12-05.log

Dec 05 '24 06:12 tribhuwan-kumar

also an idea if you want to charge people for this:

make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)

the avg size of merged video would be 100mb (depends on the time period) and these video will be in local on user's machine. if i implement this share functionality then first i have to upload those merged video on some server. still its a great idea!

Dec 06 '24 09:12 tribhuwan-kumar

@louis030195 @m13v

its not capturing the internal audio sound

[2m2024-12-06T09:41:42.397344Z[0m [32m INFO[0m [2mscreenpipe_audio::vad_engine[0m[2m:[0m SileroVad Model downloaded to: "C:\\Users\\eirae\\AppData\\Local\\screenpipe\\vad\\silero_vad.onnx"
[2m2024-12-06T09:41:44.416291Z[0m [31mERROR[0m [2mxcap::platform::impl_window[0m[2m:[0m Access is denied. (0x80070005)    
[2m2024-12-06T09:41:44.523252Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)    
[2m2024-12-06T09:41:44.523384Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)    
[2m2024-12-06T09:41:46.016032Z[0m [32m INFO[0m [2mscreenpipe_server::resource_monitor[0m[2m:[0m Runtime: 22s, Total Memory: 21% (3.38 GB / 16.09 GB), Total CPU: 110%
[2m2024-12-06T09:31:46.502086Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:31:46.503230Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:31:46.510488Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:31:46.511562Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:31:47.513953Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)    
[2m2024-12-06T09:31:47.513961Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)    
[2m2024-12-06T09:33:50.554980Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:33:50.555826Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:33:50.604802Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:33:50.605999Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected

continuously seeing these errors screenpipe.2024-12-06.log

Dec 06 '24 09:12 tribhuwan-kumar

@tribhuwan-kumar can you use --debug in cli so we know what's happening with the whisper/stt crash

Dec 06 '24 17:12 louis030195

@tribhuwan-kumar can you use --debug in cli so we know what's happening with the whisper/stt crash

after merging with latest branch its not happening, now this error

[2m2024-12-06T21:27:57.569361Z[0m [31mERROR[0m [2mscreenpipe_server::core[0m[2m:[0m Failed to insert audio transcription for device Speakers (Realtek(R) Audio) (output): error returned from database: (code: 1) table audio_transcriptions has no column named speaker_id

probably this will fix this #791

Dec 06 '24 21:12 tribhuwan-kumar

@louis030195

can i get the merge of #791 so i can complete this?

Dec 10 '24 06:12 tribhuwan-kumar

@tribhuwan-kumar merged

excited for the audio feature, adding stripe integration today so you'll be able to connect your account

Dec 10 '24 19:12 louis030195

@tribhuwan-kumar merged

excited for the audio feature, adding stripe integration today so you'll be able to connect your account

is it okay if i set the default audio chucks duration to 60 sec currently its 30 sec

Dec 11 '24 13:12 tribhuwan-kumar

@louis030195 i think merging video and audio chunks together isn't worth it, there are so many factors that can potentially ruin the video

fps factor is an issue. since the default video fps is 1 that's why after embedding audio in it, it becomes completely unwatchable. there aren't any lip and frame sync between audio and video. plz take a look at this video as an example.

the video fps should be atleast 30 fps, then there will be a sync.

https://github.com/user-attachments/assets/946afc02-3eef-464a-9c72-e6889cc3e6d3

the better implementation of this would be something like this:

in cli there should be an option like --record-loom to capture audio and video together like loom.com or obs studio does. this option will be only for recording loom types videos not for data capturing.
user can use this to record high fps videos with internal and external audio sound. they can select what audio they want to capture
after adding some changes in screenpipe-server, user can trigger that option from loom pipe ui
the video length would depend upon user how long they want to capture

for #928 we can store ocr text and audio transcription parallelly. later we can use these stored contexts to query for ai copilot

what do you think about this implementation?

Dec 12 '24 11:12 tribhuwan-kumar

@louis030195, need your opinion on this!

Dec 13 '24 15:12 tribhuwan-kumar

@tribhuwan-kumar hmm no

loom will not be implemented in core code / cli it's more a pipe

main features of screenpipe are:

24/7 screen and mic recording + metadata extraction and indexing

some idea about the UX, different paths:

merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)
merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)
merge video and then have a chat below to ask AI some tweaks - like chatbot with tools #928
merge video and merge audio are separate features
a mix of those
something else

wdyt @tribhuwan-kumar ?

Dec 13 '24 17:12 louis030195

i like option 1 btw

Dec 13 '24 17:12 louis030195

@tribhuwan-kumar hmm no

loom will not be implemented in core code / cli it's more a pipe

main features of screenpipe are:

24/7 screen and mic recording + metadata extraction and indexing

yeah, you're right. adding this --record-loom option in cli can be bloated

but don't you think, there should be an option to record high fps videos with better audio quality? from my personal experience, i'm using obs studio to record loom type video to share. if we add a functionality to record similar quality of video like loom.com does, then screenpipe users don't have to use any other screen recording tools. this screen recording functionality can be only trigger from this loom pipe. we'll not add that --record-loom option. it'll be just some functions that can be call from api. this functionality would be manageable from pipe ui and for sharing these videos we can host something like this https://screenpi.pe/loom/share/user_id_video_hash

this is just my personal experience. its okay if you don't want to implement, i'll skip this

some idea about the UX, different paths:

merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)

merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)

merge video and then have a chat below to ask AI some tweaks - like chatbot with tools

a mix of those

@louis030195 this sounds cool, working on it!

Dec 13 '24 19:12 tribhuwan-kumar

probably need to have custom prompt for the video description:

have input for user to give custom prompt
generate the video description with AI
generate voice

or

use voice LLM

i think use cases:

bob worked for 6h today and wants to share something interesting he learned with his team
john did some engineering procedure on infrastructure and want to share a quick video about it
lisa had a call with a customer and wanted to share how it went quickly
lee had a call with a user and wanted to share the insights with the product management team
alice had a call with her boyfriend and wanted to save a memory / get some AI couple coaching on the conversation with video

Dec 13 '24 20:12 louis030195

@louis030195 i'm working on ui, will show you prototype soon!

Dec 15 '24 12:12 tribhuwan-kumar

i'm facing a problem, creating loom over again again is ends up creating too many video files, there should be an option where user can delete & manage his loom, a history type something

Dec 17 '24 14:12 tribhuwan-kumar

maybe can add a history yeah, and make it easy for user to delete

Dec 17 '24 16:12 louis030195

sorry for this insane delay, my windows laptop is kinda dead by new microsoft update

Dec 26 '24 13:12 tribhuwan-kumar

ai description for video is done. there is some work have to do

[ ] timestamp for videos, like youtube does
[ ] slot type llm chat ui, like chatgpt have
[ ] sharing across the internet, @louis030195 how are you planning to do this? the avg video size is about 100 mb depends upon time period, there are some library which provides api for videos streaming https://docs.mux.com/ but its paid :(

Dec 27 '24 13:12 tribhuwan-kumar

nice!

is there audio yet?
maybe you can use https://www.assistant-ui.com/
for sharing i can give you access to supabase storage or something like this, just tell me what you need for it

Dec 28 '24 19:12 louis030195

is there audio yet?

i'm a adding a strip of frames similar to timeline pipe, with the relevant audio chunk right next to it

for sharing i can give you access to supabase storage or something like this, just tell me what you need for it

that'd be great but i was thinking isn't it privacy concern from a user's perspective? storing videos on some database.... anyway i'm happy to implement it!

Dec 29 '24 14:12 tribhuwan-kumar

yeah right, did not think about privacy

can you look at the code here https://github.com/CapSoftware/Cap

they have really good UX for this and local features

maybe it's just copying to clipboard the video or something

thinking of ngrok too ... but probably not a good idea lol ...

Dec 29 '24 21:12 louis030195