screenpipe icon indicating copy to clipboard operation
screenpipe copied to clipboard

[bounty] support for video and voice LLM in search, timeline, meeting

Open louis030195 opened this issue 11 months ago • 10 comments

likely need to break down in multiple bounties

/bounty 400

eg

  • meeting: use voice LLM to transcribe or summarize audio would increase a lot quality - 10x better than granola etc
  • search: use video LLM would be much more powerful and different context windows
  • timeline: same

suggest rough design, might create other issues

louis030195 avatar Jan 13 '25 21:01 louis030195

💎 $400 bounty • screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #1142 with your implementation plan
  2. Submit work: Create a pull request including /claim #1142 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

❗ Important guidelines:

  • To claim a bounty, you need to provide a short demo video of your changes in your pull request
  • If anything is unclear, ask for clarification before starting as this will help avoid potential rework
  • Low quality AI PRs will not receive review and will be closed
  • Do not ask to be assigned unless you've contributed before

Thank you for contributing to mediar-ai/screenpipe!

Attempt Started (UTC) Solution Actions
🟢 @BenraouaneSoufiane Aug 05, 2025, 10:31:57 AM WIP
🟢 @7908837174 Oct 23, 2025, 04:49:57 AM WIP

algora-pbc[bot] avatar Jan 13 '25 21:01 algora-pbc[bot]

I wanna work on it, how are you validating this? need more context.

kumarvivek1752 avatar Jan 22 '25 12:01 kumarvivek1752

/attempt #114

RaghavArora14 avatar Feb 20 '25 20:02 RaghavArora14

@RaghavArora14: We appreciate your enthusiasm but since you already have 3 active bounty attempts, we're going to keep this open for other contributors to attempt. 🫡

algora-pbc[bot] avatar Feb 20 '25 20:02 algora-pbc[bot]

/attempt https://github.com/mediar-ai/screenpipe/pull/114

ToSeven avatar May 23 '25 07:05 ToSeven

/attempt #1142

BenraouaneSoufiane avatar Aug 05 '25 10:08 BenraouaneSoufiane

@louis030195 Proposed breakdown, would be ~400$ each:

  1. Voice LLM for meetings
    • Whisper → Transcription
    • LLM summarization
  2. Video LLM for search
    • Frame/audio analysis → LLM → Embeddings
    • FAISS-powered semantic search
  3. Timeline enhancement
    • Combine visual/audio tags
    • Auto-label segments (topic, scene, speaker)

Would start with #1 (meeting voice summary) and propose incremental PRs. Feedback welcome.

BenraouaneSoufiane avatar Aug 05 '25 10:08 BenraouaneSoufiane

@louis030195 can you release the amount?

BenraouaneSoufiane avatar Aug 05 '25 11:08 BenraouaneSoufiane

问题解决了吗? Is the problem solved?

Deng-Xian-Sheng avatar Oct 09 '25 14:10 Deng-Xian-Sheng

/attempt https://github.com/mediar-ai/screenpipe/issues/1142

kallal79 avatar Oct 23 '25 04:10 kallal79