[bounty] loom pipe with audio
#583
@tribhuwan-kumar
can you add audio?
/bounty 100
💎 $100 bounty • Screenpi.pe
Steps to solve:
- Start working: Comment
/attempt #804with your implementation plan - Submit work: Create a pull request including
/claim #804in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
Thank you for contributing to mediar-ai/screenpipe!
also an idea if you want to charge people for this:
make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)
working on it!
fyi we are charging $10/m for the loom pipe and we would give you 50%
so if you can help making it more useful / intuitive, etc. with audio and more!
fyi we are charging $10/m for the loom pipe and we would give you 50%
so its gonna be a paid pipe, i get it!
so if you can help making it more useful / intuitive, etc. with audio and more!
there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.
can you list audio devices through cli and run cli screenpipe with internal audio devices and confirms it's not recording it and share any errors matt
On Wed, Dec 4, 2024 at 9:22 PM tribhuwan @.***> wrote:
fyi we are charging $10/m for the loom pipe and we would give you 50%
so its gonna be a paid pipe, i get it!
so if you can help making it more useful / intuitive, etc. with audio and more!
there is few ui improvements i have to do, since its a paid pipe. also i've a question to ask. why does screenpipe fails to capture internal audio? it was fine before. yesterday i merge it with latest branch. after that it isn't recording the internal audio. its only recording external audio which is coming from mic.
190295C0-60B8-40CD-9E32-6084C5DCCDB1.png (view on web) https://github.com/user-attachments/assets/5c99e1bb-bd55-4663-a63c-008be287b63a
— Reply to this email directly, view it on GitHub https://github.com/mediar-ai/screenpipe/issues/804#issuecomment-2519206725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY62CDED7ZWT4KW5HALGKNL2D7PKTAVCNFSM6AAAAABSXWRP3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJZGIYDMNZSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
also an idea if you want to charge people for this:
make it easy to share a link, like loom hosted video, and after x hosting you have to pay like $20/m or something (i can help)
the avg size of merged video would be 100mb (depends on the time period) and these video will be in local on user's machine. if i implement this share functionality then first i have to upload those merged video on some server. still its a great idea!
@louis030195 @m13v
its not capturing the internal audio sound
[2m2024-12-06T09:41:42.397344Z[0m [32m INFO[0m [2mscreenpipe_audio::vad_engine[0m[2m:[0m SileroVad Model downloaded to: "C:\\Users\\eirae\\AppData\\Local\\screenpipe\\vad\\silero_vad.onnx"
[2m2024-12-06T09:41:44.416291Z[0m [31mERROR[0m [2mxcap::platform::impl_window[0m[2m:[0m Access is denied. (0x80070005)
[2m2024-12-06T09:41:44.523252Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)
[2m2024-12-06T09:41:44.523384Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)
[2m2024-12-06T09:41:46.016032Z[0m [32m INFO[0m [2mscreenpipe_server::resource_monitor[0m[2m:[0m Runtime: 22s, Total Memory: 21% (3.38 GB / 16.09 GB), Total CPU: 110%
[2m2024-12-06T09:31:46.502086Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process
[2m2024-12-06T09:31:46.503230Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected
[2m2024-12-06T09:31:46.510488Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process
[2m2024-12-06T09:31:46.511562Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected
[2m2024-12-06T09:31:47.513953Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Microphone Array (Intel® Smart Sound Technology for Digital Microphones) (input) (30s segments)
[2m2024-12-06T09:31:47.513961Z[0m [32m INFO[0m [2mscreenpipe_audio::core[0m[2m:[0m starting continuous recording for Speakers (Realtek(R) Audio) (output) (30s segments)
[2m2024-12-06T09:33:50.554980Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process
[2m2024-12-06T09:33:50.555826Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected
[2m2024-12-06T09:33:50.604802Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m whisper channel disconnected, restarting recording process
[2m2024-12-06T09:33:50.605999Z[0m [31mERROR[0m [2mscreenpipe_audio::core[0m[2m:[0m record_and_transcribe error, restarting: Whisper channel disconnected
continuously seeing these errors screenpipe.2024-12-06.log
@tribhuwan-kumar can you use --debug in cli so we know what's happening with the whisper/stt crash
@tribhuwan-kumar can you use
--debugin cli so we know what's happening with the whisper/stt crash
after merging with latest branch its not happening, now this error
[2m2024-12-06T21:27:57.569361Z[0m [31mERROR[0m [2mscreenpipe_server::core[0m[2m:[0m Failed to insert audio transcription for device Speakers (Realtek(R) Audio) (output): error returned from database: (code: 1) table audio_transcriptions has no column named speaker_id
probably this will fix this #791
@louis030195
can i get the merge of #791 so i can complete this?
@tribhuwan-kumar merged
excited for the audio feature, adding stripe integration today so you'll be able to connect your account
@tribhuwan-kumar merged
excited for the audio feature, adding stripe integration today so you'll be able to connect your account
is it okay if i set the default audio chucks duration to 60 sec currently its 30 sec
@louis030195 i think merging video and audio chunks together isn't worth it, there are so many factors that can potentially ruin the video
- fps factor is an issue. since the default video fps is 1 that's why after embedding audio in it, it becomes completely unwatchable. there aren't any lip and frame sync between audio and video. plz take a look at this video as an example.
the video fps should be atleast 30 fps, then there will be a sync.
https://github.com/user-attachments/assets/946afc02-3eef-464a-9c72-e6889cc3e6d3
the better implementation of this would be something like this:
- in cli there should be an option like
--record-loomto capture audio and video together like loom.com or obs studio does. this option will be only for recording loom types videos not for data capturing. - user can use this to record high fps videos with internal and external audio sound. they can select what audio they want to capture
- after adding some changes in screenpipe-server, user can trigger that option from loom pipe ui
- the video length would depend upon user how long they want to capture
for #928 we can store ocr text and audio transcription parallelly. later we can use these stored contexts to query for ai copilot
what do you think about this implementation?
@louis030195, need your opinion on this!
@tribhuwan-kumar hmm no
loom will not be implemented in core code / cli it's more a pipe
main features of screenpipe are:
- 24/7 screen and mic recording + metadata extraction and indexing
some idea about the UX, different paths:
- merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)
- merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)
- merge video and then have a chat below to ask AI some tweaks - like chatbot with tools #928
- merge video and merge audio are separate features
- a mix of those
- something else
wdyt @tribhuwan-kumar ?
i like option 1 btw
@tribhuwan-kumar hmm no
loom will not be implemented in core code / cli it's more a pipe
main features of screenpipe are:
- 24/7 screen and mic recording + metadata extraction and indexing
yeah, you're right. adding this --record-loom option in cli can be bloated
but don't you think, there should be an option to record high fps videos with better audio quality?
from my personal experience, i'm using obs studio to record loom type video to share. if we add a functionality to record similar quality of video like loom.com does, then screenpipe users don't have to use any other screen recording tools. this screen recording functionality can be only trigger from this loom pipe. we'll not add that --record-loom option. it'll be just some functions that can be call from api. this functionality would be manageable from pipe ui and for sharing these videos we can host something like this https://screenpi.pe/loom/share/user_id_video_hash
this is just my personal experience. its okay if you don't want to implement, i'll skip this
some idea about the UX, different paths:
merge video and then use AI to comment on the video based on audio transcriptions that are on overlapping timestamps (with eventual captions), using elevenlabs, openai, gemini, local model, whatever (v0 would be cloud model likely)
merge video and then show a list of audio samples on these timestamps, while the video is playing it would highlight the relevant audio samples (not been played but the user can click and play these)
merge video and then have a chat below to ask AI some tweaks - like chatbot with tools
a mix of those
@louis030195 this sounds cool, working on it!
probably need to have custom prompt for the video description:
- have input for user to give custom prompt
- generate the video description with AI
- generate voice
or
- use voice LLM
i think use cases:
- bob worked for 6h today and wants to share something interesting he learned with his team
- john did some engineering procedure on infrastructure and want to share a quick video about it
- lisa had a call with a customer and wanted to share how it went quickly
- lee had a call with a user and wanted to share the insights with the product management team
- alice had a call with her boyfriend and wanted to save a memory / get some AI couple coaching on the conversation with video
@louis030195 i'm working on ui, will show you prototype soon!
i'm facing a problem, creating loom over again again is ends up creating too many video files, there should be an option where user can delete & manage his loom, a history type something
maybe can add a history yeah, and make it easy for user to delete
ai description for video is done. there is some work have to do
- [ ] timestamp for videos, like youtube does
- [ ] slot type llm chat ui, like chatgpt have
- [ ] sharing across the internet, @louis030195 how are you planning to do this? the avg video size is about 100 mb depends upon time period, there are some library which provides api for videos streaming https://docs.mux.com/ but its paid :(
nice!
- is there audio yet?
- maybe you can use https://www.assistant-ui.com/
- for sharing i can give you access to supabase storage or something like this, just tell me what you need for it
- is there audio yet?
i'm a adding a strip of frames similar to timeline pipe, with the relevant audio chunk right next to it
for sharing i can give you access to supabase storage or something like this, just tell me what you need for it
that'd be great but i was thinking isn't it privacy concern from a user's perspective? storing videos on some database.... anyway i'm happy to implement it!
yeah right, did not think about privacy
can you look at the code here https://github.com/CapSoftware/Cap
they have really good UX for this and local features
maybe it's just copying to clipboard the video or something
thinking of ngrok too ... but probably not a good idea lol ...