To add OpenAI Whisper, or Fast-Whisper or Faster-Whisper-XXL
Hello everybody! I have been using Aegisub for a long time to create/make subtitles. It is an essential tool for me. However, the software is missing an important feature: an audio-to-text function such as OpenAI Whisper, Fast-Whisper, or the advanced version, Faster-Whisper-XXL. Another provider of open-source software, Tero Subtitler, already offers such a feature. I would be delighted and very grateful if a similar function could be integrated into Aegisub in the future. Thank you for helping to improve accessibility for people with hard of hearing (hearing loss) and for those who are deaf.
Hello everybody! I have been using Aegisub for a long time to create/make subtitles. It is an essential tool for me. However, the software is missing an important feature: an audio-to-text function such as OpenAI Whisper, Fast-Whisper, or the advanced version, Faster-Whisper-XXL. Another provider of open-source software, Tero Subtitler, already offers such a feature. I would be delighted and very grateful if a similar function could be integrated into Aegisub in the future. Thank you for helping to improve accessibility for people with hard of hearing (hearing loss) and for those who are deaf.
Seconded.
While this would no doubt be useful, can users not already use such tools externally and just load those output files into Aegisub? As I understand, Whisper outputs (or at least, can output) SRT, which can be opened normally in Aegisub.
Especially considering how you've already named three different versions of Whisper, integrating this directly into Aegisub would add another dependency that is likely to see significant updates in the near future, so I'm not sure going down that path is the best idea. Perhaps a reasonable compromise is improving the scripting API to allow for e.g. more direct Python interop, in which case such integrations would be handled by more easily-updated scripts, rather than requiring an entirely new Aegisub version. (this sounds especially appealing, considering how long upstream development was stalled out)
While this would no doubt be useful, can users not already use such tools externally and just load those output files into Aegisub? As I understand, Whisper outputs (or at least, can output) SRT, which can be opened normally in Aegisub.
Especially considering how you've already named three different versions of Whisper, integrating this directly into Aegisub would add another dependency that is likely to see significant updates in the near future, so I'm not sure going down that path is the best idea. Perhaps a reasonable compromise is improving the scripting API to allow for e.g. more direct Python interop, in which case such integrations would be handled by more easily-updated scripts, rather than requiring an entirely new Aegisub version. (this sounds especially appealing, considering how long upstream development was stalled out)
What about a mini app that can directly open the output on Aegisub or any other subtitle software.
Thank you for discussing subtitling! In today's digital age, AI tools such as Whisper offer efficient ways to generate subtitles automatically, saving a significant amount of time. However, final checks are still important to ensure accuracy, correct reading times and appropriate line lengths.
It would be fantastic if technologies like Whisper could be integrated directly into Aegisub, either as a plug-in or as a built-in feature. This would streamline the workflow and eliminate the need to switch between multiple programs such as Tero Subtitler and others, which can often be complex and error-prone. Offline solutions would also be more privacy friendly.
I hope this can be realised in the near future, and I wish the development team every success in implementing this feature! 🙂
Thank you for discussing subtitling! In today's digital age, AI tools such as Whisper offer efficient ways to generate subtitles automatically, saving a significant amount of time. However, final checks are still important to ensure accuracy, correct reading times and appropriate line lengths.
It would be fantastic if technologies like Whisper could be integrated directly into Aegisub, either as a plug-in or as a built-in feature. This would streamline the workflow and eliminate the need to switch between multiple programs such as Tero Subtitler and others, which can often be complex and error-prone. Offline solutions would also be more privacy friendly.
I hope this can be realised in the near future, and I wish the development team every success in implementing this feature! 🙂
I use Subtitle Edit more often precisely because it has Whisper support.
I use Subtitle Edit more often precisely because it has Whisper support.
I know this software, but it is too complicated and has too many functions that not everyone can use. It also only runs on Windows, not on Mac or Ubuntu. Agisub is more flexible and runs on all operating systems. We are staying with Agisub, but you should also try out other free software to perhaps find ideas for further development. I wish the development team good luck and success.
I know this software, but it is too complicated and has too many functions that not everyone can use. It also only runs on Windows, not on Mac or Ubuntu. Agisub is more flexible and runs on all operating systems. We are staying with Agisub, but you should also try out other free software to perhaps find ideas for further development. I wish the development team good luck and success.
I use both, so it'd be good to have options for both. And I have Windows and Mac. Tried Linux, too complicated for me.
I agree with petzku, adding something like Whisper would be a lot of work for something that would also work fairly well as an external tool. I don't want to say it won't ever get added, but it's pretty far down on the list of priorities (and even then I'd much rather find a way to call it via an automation script than integrate it directly). For the moment, my main development focus (aside from fixes and maintenance) is improving the things that can only be done with Aegisub, rather than making it possible to do everything with Aegisub.
I agree with petzku, adding something like Whisper would be a lot of work for something that would also work fairly well as an external tool. I don't want to say it won't ever get added, but it's pretty far down on the list of priorities (and even then I'd much rather find a way to call it via an automation script than integrate it directly). For the moment, my main development focus (aside from fixes and maintenance) is improving the things that can only be done with Aegisub, rather than making it possible to do everything with Aegisub.
I can understand.
Whisper.cpp can generate .srt files:
for file in *.wav; do whisper.cpp-whisper-cli -m /usr/share/whisper.cpp/models/ggml-large-v3.bin -l en --output-srt --output-file "${file%.wav}" "$file"; done
(you'd first get all mp4 files wav using ffmpeg) then load the srt generated files on Aegisub.
Whisper.cpp is amazing but the timing is not as great. also it's for english only.
that being said maybe with a lua script it can be done, right?
i let chatgpt make this python script in last season. it also have gui. i hope it'll be added to aegisub but if you need now, you can take it. idk what you need for adding step. you can ask chat gpt for it i think. here is the code. i made it to be japanese language small model only but i think it'll only need small tweaking to switch another language or model.
import os import subprocess import threading import tkinter as tk from tkinter import filedialog, messagebox import torch import whisper_timestamped as whisper import time import warnings
warnings.simplefilter("ignore", UserWarning)
def extract_audio(video_path, temp_audio_path): """Extracts audio from the video file using FFmpeg.""" command = [ "ffmpeg", "-y", "-i", video_path, "-ac", "1", "-ar", "16000", "-c:a", "pcm_s16le", temp_audio_path ] subprocess.run(command, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
def transcribe_audio(audio_path, model): """Transcribes the audio using Whisper Timestamped.""" return model.transcribe(audio_path, language='ja', word_timestamps=True)
def save_srt(subtitles, output_path): """Saves transcribed subtitles in SRT format.""" with open(output_path, "w", encoding="utf-8") as f: for i, segment in enumerate(subtitles["segments"]): start = segment["start"] end = segment["end"] text = segment["text"]
start_srt = format_timestamp(start)
end_srt = format_timestamp(end)
f.write(f"{i+1}\n{start_srt} --> {end_srt}\n{text}\n\n")
def format_timestamp(seconds): """Formats timestamp for SRT files.""" millisec = int((seconds % 1) * 1000) minutes, seconds = divmod(int(seconds), 60) hours, minutes = divmod(minutes, 60) return f"{hours:02}:{minutes:02}:{seconds:02},{millisec:03}"
def format_total_time(seconds): """Formats total processing time in minutes and seconds.""" minutes, seconds = divmod(int(seconds), 60) return f"{minutes} min {seconds} sec"
def animate_loading(): """Animates the 'Loading...' text with a clipping effect.""" text = "Loading..." length = len(text) animate_loading.counter = (animate_loading.counter + 1) % (length + 1) loading_label.config(text=text[:animate_loading.counter] + " " * (length - animate_loading.counter), foreground="black") root.after(150, animate_loading) # Update every 150ms animate_loading.counter = 0
def process_video(): """Handles video processing in a separate thread.""" video_path = filedialog.askopenfilename(title="Select Video File", filetypes=[("Video Files", ".mp4;.mkv;.avi;.mov")]) if not video_path: return
output_srt = os.path.splitext(video_path)[0] + ".srt"
temp_audio = "temp_audio.wav"
loading_label.pack() # Show loading animation
root.update_idletasks()
start_time = time.time()
extract_audio(video_path, temp_audio)
model = whisper.load_model("small")
result = transcribe_audio(temp_audio, model)
save_srt(result, output_srt)
os.remove(temp_audio) # Delete temporary audio file
total_time = time.time() - start_time
formatted_time = format_total_time(total_time)
loading_label.pack_forget() # Hide loading animation
messagebox.showinfo("Done", f"Subtitles saved as {output_srt}\nProcessing Time: {formatted_time}")
def start_processing(): threading.Thread(target=process_video, daemon=True).start()
root = tk.Tk() root.title("Whisper SRT Generator") root.geometry("400x200")
tk.Label(root, text="Select a video file to generate subtitles").pack(pady=10) tk.Button(root, text="Select Video", command=start_processing).pack(pady=5)
loading_label = tk.Label(root, text="Loading...", font=("Arial", 12), foreground="black") root.after(150, animate_loading) # Start animation
root.mainloop()
I would be delighted if this were integrated. This would save all other users from having to type and save valuable time.