ai-clips-maker
ai-clips-maker copied to clipboard
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
🎬 ai-clips-maker
Created by Alperen Sümeroğlu — An AI-native video engine that turns long-form content into short, viral-ready clips with surgical precision.
ai-clips-maker is a smart, modular Python tool built for creators, educators, and developers. It transcribes speech, detects speakers, analyzes scenes, and crops around the key moments — creating ready-to-share vertical clips for TikTok, Reels, and Shorts with zero manual editing.
📚 Contents
- 📦 Features
- 🛠 Installation
- 🚀 Quickstart
- 🔍 How It Works
- ⚙️ Tech Stack
- 🎯 Use Cases
- 🧪 Tests
- 🗺 Roadmap
- 🤝 Contribute
- 👤 Author
- 🎧 Weekly Rewind Podcast
- 📄 License
📦 Features
- 🎞️ Auto-segment videos based on speech & scene shifts
- 🧠 Word-level transcription using WhisperX
- 🗣️ Speaker diarization (who spoke when) via Pyannote
- 🪄 Face/body-aware cropping focused on active speaker
- 📐 Output formats: 9:16 (vertical), 1:1 (square), 16:9 (wide)
- 🔌 Modular and easily extensible pipeline
🛠 Installation
# Install main package
pip install ai-clips-maker
# Install WhisperX from source
pip install git+https://github.com/m-bain/whisperx.git
# Install dependencies
# macOS
brew install libmagic ffmpeg
# Ubuntu/Debian
sudo apt install libmagic1 ffmpeg
🚀 Quickstart
from ai_clips_maker import Transcriber, ClipFinder, resize
# Step 1: Transcription
transcriber = Transcriber()
transcription = transcriber.transcribe(audio_file_path="/path/to/video.mp4")
# Step 2: Clip detection
clip_finder = ClipFinder()
clips = clip_finder.find_clips(transcription=transcription)
print(clips[0].start_time, clips[0].end_time)
# Step 3: Cropping & resizing
crops = resize(
video_file_path="/path/to/video.mp4",
pyannote_auth_token="your_huggingface_token",
aspect_ratio=(9, 16)
)
print(crops.segments)
🔍 How It Works
- 🎧 Extracts audio from video
- ✍️ Transcribes speech using WhisperX
- 🧍 Identifies speakers with Pyannote
- 🎬 Detects scene changes & speaker shifts
- 🎯 Crops video around active speaker’s position
- 📤 Exports clips in desired format
⚙️ Tech Stack
| 🔧 Module | 🧠 Technology | 💡 Purpose |
|---|---|---|
| Transcription | WhisperX | Word-level speech-to-text with timestamps |
| Diarization | Pyannote.audio | Speaker segmentation (who spoke when) |
| Video Processing | OpenCV, PyAV | Frame-by-frame video control |
| Scene Detection | Scenedetect | Detects shot boundaries |
| ML Inference | PyTorch | Powering WhisperX & Pyannote models |
| Data Handling | NumPy, Pandas | Transcription & clip structuring |
| Media Utilities | ffmpeg, libmagic | Media decoding + type detection |
| Testing Framework | pytest | End-to-end and unit testing support |
All tools were selected for speed, flexibility, and production-grade stability.
🎯 Use Cases
- 🎙 Podcasters clipping episodes into shareable highlights
- 📚 Teachers summarizing lecture content
- 📱 Social media teams repurposing YouTube for Reels
- 🧠 Developers automating video workflows
- 🚀 Startups building AI-based content tools
🧪 Tests
# Run test suite
pytest tests/
Covers all components: transcriber, diarizer, clip detector, resizer.
🗺 Roadmap
| Status | Feature | Note |
|---|---|---|
| ✅ | Core pipeline: Transcribe → Diarize → Detect | Implemented in v1.0 |
| ✅ | Speaker-aware video cropping | Production ready |
| 🚧 | Multi-language subtitle generation | Planned for Q2 2025 |
| 📌 | Auto-caption overlay | In design phase |
| 🧪 | Web UI (upload + preview clips) | Prototype in progress |
| 🧠 | HuggingFace or Streamlit live demo | On backlog |
🤝 Contribute
We welcome pull requests, ideas, and feedback.
# Fork the repo
git clone https://github.com/alperensumeroglu/ai-clips-maker.git
cd ai-clips-maker
# Create feature branch
git checkout -b feat/your-feature
# Make changes, commit, and push
git commit -am "Add feature"
git push origin feat/your-feature
Before contributing, please review open issues and coding style guide.
👤 Author
Alperen Sümeroğlu
Computer Engineer • Entrepreneur • World Explorer 🌍
15+ European countries explored ✈️
“Let your code tell your story — clean, powerful, and useful.”
🎧 Weekly Rewind Podcast
🎤 Weekly insights on AI, tech, and building globally — by Alperen Sümeroğlu.
🚀 What does it take to grow as a Computer Engineering student, build projects, and explore global innovation?
This API is part of a bigger journey I share in Weekly Rewind — my real-time documentary podcast series, where I reflect weekly on coding breakthroughs, innovation insights, startup stories, and lessons from around the world.
💡 What is Weekly Rewind?
A behind-the-scenes look at real-world experiences, global insights, and hands-on learning. Each episode includes:
- 🔹 Inside My Coding & Engineering Projects
- 🔹 Startup Ideas & Entrepreneurial Lessons
- 🔹 Trends in Tech & AI
- 🔹 Innovation from 15+ Countries
- 🔹 Guest Conversations with Builders & Engineers
- 🔹 Productivity, Learning & Growth Strategies
🎧 Listen now:
“True learning isn’t in tutorials — it’s in building, exploring, and reflecting.”
📄 License
MIT License — Free for commercial and personal use.
© 2024 Alperen Sümeroğlu