screenpipe icon indicating copy to clipboard operation
screenpipe copied to clipboard

[bounty] loom pipe with audio

Open louis030195 opened this issue 1 year ago • 4 comments

#583

@tribhuwan-kumar

can you add audio?

/bounty 100

louis030195 avatar Nov 29 '24 20:11 louis030195

💎 $100 bounty • Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #804 with your implementation plan
  2. Submit work: Create a pull request including /claim #804 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bountyShare on socials

algora-pbc[bot] avatar Nov 29 '24 20:11 algora-pbc[bot]

also an idea if you want to charge people for this:

make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)

louis030195 avatar Nov 29 '24 20:11 louis030195

working on it!

tribhuwan-kumar avatar Nov 30 '24 15:11 tribhuwan-kumar

Screenshot 2024-12-04 at 8 42 10 AM

fyi we are charging $10/m for the loom pipe and we would give you 50%

so if you can help making it more useful / intuitive, etc. with audio and more!

louis030195 avatar Dec 04 '24 16:12 louis030195

fyi we are charging $10/m for the loom pipe and we would give you 50%

so its gonna be a paid pipe, i get it!

so if you can help making it more useful / intuitive, etc. with audio and more!

there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.

{190295C0-60B8-40CD-9E32-6084C5DCCDB1}

tribhuwan-kumar avatar Dec 05 '24 05:12 tribhuwan-kumar

can you list audio devices through cli and run cli screenpipe with internal audio devices and confirms it's not recording it and share any errors matt

On Wed, Dec 4, 2024 at 9:22 PM tribhuwan @.***> wrote:

fyi we are charging $10/m for the loom pipe and we would give you 50%

so its gonna be a paid pipe, i get it!

so if you can help making it more useful / intuitive, etc. with audio and more!

there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.

190295C0-60B8-40CD-9E32-6084C5DCCDB1.png (view on web) https://github.com/user-attachments/assets/5c99e1bb-bd55-4663-a63c-008be287b63a

— Reply to this email directly, view it on GitHub https://github.com/mediar-ai/screenpipe/issues/804#issuecomment-2519206725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY62CDED7ZWT4KW5HALGKNL2D7PKTAVCNFSM6AAAAABSXWRP3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJZGIYDMNZSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

m13v avatar Dec 05 '24 05:12 m13v

also an idea if you want to charge people for this:

make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)

the avg size of merged video would be 100mb (depends on the time period) and these video will be in local on user's machine. if i implement this share functionality then first i have to upload those merged video on some server. still its a great idea!

tribhuwan-kumar avatar Dec 06 '24 09:12 tribhuwan-kumar

@louis030195 @m13v

its not capturing the internal audio sound

[2m2024-12-06T09:41:42.397344Z[0m [32m INFO[0m [2mscreenpipe_audio::vad_engine[0m[2m:[0m SileroVad Model downloaded to: "C:\\Users\\eirae\\AppData\\Local\\screenpipe\\vad\\silero_vad.onnx"
[2m2024-12-06T09:41:44.416291Z[0m [31mERROR[0m [2mxcap::platform::impl_window[0m[2m:[0m Access is denied. (0x80070005)    
[2m2024-12-06T09:41:44.523252Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)    
[2m2024-12-06T09:41:44.523384Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)    
[2m2024-12-06T09:41:46.016032Z[0m [32m INFO[0m [2mscreenpipe_server::resource_monitor[0m[2m:[0m Runtime: 22s, Total Memory: 21% (3.38 GB / 16.09 GB), Total CPU: 110%
[2m2024-12-06T09:31:46.502086Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:31:46.503230Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:31:46.510488Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:31:46.511562Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:31:47.513953Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)    
[2m2024-12-06T09:31:47.513961Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)    
[2m2024-12-06T09:33:50.554980Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:33:50.555826Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    
[2m2024-12-06T09:33:50.604802Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process    
[2m2024-12-06T09:33:50.605999Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected    

continuously seeing these errors screenpipe.2024-12-06.log

tribhuwan-kumar avatar Dec 06 '24 09:12 tribhuwan-kumar

@tribhuwan-kumar can you use --debug in cli so we know what's happening with the whisper/stt crash

louis030195 avatar Dec 06 '24 17:12 louis030195

@tribhuwan-kumar can you use --debug in cli so we know what's happening with the whisper/stt crash

after merging with latest branch its not happening, now this error

[2m2024-12-06T21:27:57.569361Z[0m [31mERROR[0m [2mscreenpipe_server::core[0m[2m:[0m Failed to insert audio transcription for device Speakers (Realtek(R) Audio) (output): error returned from database: (code: 1) table audio_transcriptions has no column named speaker_id    

probably this will fix this #791

tribhuwan-kumar avatar Dec 06 '24 21:12 tribhuwan-kumar

@louis030195

can i get the merge of #791 so i can complete this?

tribhuwan-kumar avatar Dec 10 '24 06:12 tribhuwan-kumar

@tribhuwan-kumar merged

excited for the audio feature, adding stripe integration today so you'll be able to connect your account

louis030195 avatar Dec 10 '24 19:12 louis030195

@tribhuwan-kumar merged

excited for the audio feature, adding stripe integration today so you'll be able to connect your account

is it okay if i set the default audio chucks duration to 60 sec currently its 30 sec

tribhuwan-kumar avatar Dec 11 '24 13:12 tribhuwan-kumar

@louis030195 i think merging video and audio chunks together isn't worth it, there are so many factors that can potentially ruin the video

  • fps factor is an issue. since the default video fps is 1 that's why after embedding audio in it, it becomes completely unwatchable. there aren't any lip and frame sync between audio and video. plz take a look at this video as an example.

the video fps should be atleast 30 fps, then there will be a sync.

https://github.com/user-attachments/assets/946afc02-3eef-464a-9c72-e6889cc3e6d3

the better implementation of this would be something like this:

  • in cli there should be an option like --record-loom to capture audio and video together like loom.com or obs studio does. this option will be only for recording loom types videos not for data capturing.
  • user can use this to record high fps videos with internal and external audio sound. they can select what audio they want to capture
  • after adding some changes in screenpipe-server, user can trigger that option from loom pipe ui
  • the video length would depend upon user how long they want to capture

for #928 we can store ocr text and audio transcription parallelly. later we can use these stored contexts to query for ai copilot

what do you think about this implementation?

tribhuwan-kumar avatar Dec 12 '24 11:12 tribhuwan-kumar

@louis030195, need your opinion on this!

tribhuwan-kumar avatar Dec 13 '24 15:12 tribhuwan-kumar

@tribhuwan-kumar hmm no

loom will not be implemented in core code / cli it's more a pipe

main features of screenpipe are:

  • 24/7 screen and mic recording + metadata extraction and indexing

some idea about the UX, different paths:

  • merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)
  • merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)
  • merge video and then have a chat below to ask AI some tweaks - like chatbot with tools #928
  • merge video and merge audio are separate features
  • a mix of those
  • something else

wdyt @tribhuwan-kumar ?

louis030195 avatar Dec 13 '24 17:12 louis030195

i like option 1 btw

louis030195 avatar Dec 13 '24 17:12 louis030195

@tribhuwan-kumar hmm no

loom will not be implemented in core code / cli it's more a pipe

main features of screenpipe are:

  • 24/7 screen and mic recording + metadata extraction and indexing

yeah, you're right. adding this --record-loom option in cli can be bloated

but don't you think, there should be an option to record high fps videos with better audio quality? from my personal experience, i'm using obs studio to record loom type video to share. if we add a functionality to record similar quality of video like loom.com does, then screenpipe users don't have to use any other screen recording tools. this screen recording functionality can be only trigger from this loom pipe. we'll not add that --record-loom option. it'll be just some functions that can be call from api. this functionality would be manageable from pipe ui and for sharing these videos we can host something like this https://screenpi.pe/loom/share/user_id_video_hash

this is just my personal experience. its okay if you don't want to implement, i'll skip this

some idea about the UX, different paths:

  • merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)

  • merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)

  • merge video and then have a chat below to ask AI some tweaks - like chatbot with tools

  • a mix of those

@louis030195 this sounds cool, working on it!

tribhuwan-kumar avatar Dec 13 '24 19:12 tribhuwan-kumar

probably need to have custom prompt for the video description:

  • have input for user to give custom prompt
  • generate the video description with AI
  • generate voice

or

  • use voice LLM

i think use cases:

  • bob worked for 6h today and wants to share something interesting he learned with his team
  • john did some engineering procedure on infrastructure and want to share a quick video about it
  • lisa had a call with a customer and wanted to share how it went quickly
  • lee had a call with a user and wanted to share the insights with the product management team
  • alice had a call with her boyfriend and wanted to save a memory / get some AI couple coaching on the conversation with video

louis030195 avatar Dec 13 '24 20:12 louis030195

@louis030195 i'm working on ui, will show you prototype soon!

tribhuwan-kumar avatar Dec 15 '24 12:12 tribhuwan-kumar

i'm facing a problem, creating loom over again again is ends up creating too many video files, there should be an option where user can delete & manage his loom, a history type something

tribhuwan-kumar avatar Dec 17 '24 14:12 tribhuwan-kumar

maybe can add a history yeah, and make it easy for user to delete

louis030195 avatar Dec 17 '24 16:12 louis030195

image

sorry for this insane delay, my windows laptop is kinda dead by new microsoft update

tribhuwan-kumar avatar Dec 26 '24 13:12 tribhuwan-kumar

ai description for video is done. there is some work have to do

  • [ ] timestamp for videos, like youtube does
  • [ ] slot type llm chat ui, like chatgpt have
  • [ ] sharing across the internet, @louis030195 how are you planning to do this? the avg video size is about 100 mb depends upon time period, there are some library which provides api for videos streaming https://docs.mux.com/ but its paid :(

tribhuwan-kumar avatar Dec 27 '24 13:12 tribhuwan-kumar

nice!

  • is there audio yet?
  • maybe you can use https://www.assistant-ui.com/
  • for sharing i can give you access to supabase storage or something like this, just tell me what you need for it

louis030195 avatar Dec 28 '24 19:12 louis030195

  • is there audio yet?

i'm a adding a strip of frames similar to timeline pipe, with the relevant audio chunk right next to it

for sharing i can give you access to supabase storage or something like this, just tell me what you need for it

that'd be great but i was thinking isn't it privacy concern from a user's perspective? storing videos on some database.... anyway i'm happy to implement it!

tribhuwan-kumar avatar Dec 29 '24 14:12 tribhuwan-kumar

yeah right, did not think about privacy

can you look at the code here https://github.com/CapSoftware/Cap

they have really good UX for this and local features

maybe it's just copying to clipboard the video or something

thinking of ngrok too ... but probably not a good idea lol ...

louis030195 avatar Dec 29 '24 21:12 louis030195