pysubs2 icon indicating copy to clipboard operation
pysubs2 copied to clipboard

Suggestion: Time-based cutting utility

Open johnpyp opened this issue 3 years ago • 1 comments

For stuff like cleaning audio transcript datasets, it's necessary to cut out segments of the corresponding subtitles when cutting out bad parts of the training audio. This is partially doable by merging the subtitles into an mkv container with the audio, and then using ffmpeg on it and splitting them apart again, but is far from ideal.

Having an easy way to just operate on the subtitles with an api like subs.cut(start="30:30", end="40:20"), which would remove the offending section and then shift everything after down would be really nice for this usecase.

johnpyp avatar Oct 28 '20 14:10 johnpyp

That sounds like a useful feature! :) In terms of API, pysubs2 represents time in seconds. When a method (like SSAFile.shift()) takes just one time, it can be "sugared" to keyword arguments for hours, seconds, etc., so it looks pretty short: subs.shift(m=1, s=30). Unfortunately this would not work for multiple time inputs, which could work like this:

from pysubs2 import load, make_time

subs = load("subtitles.srt")
subs.cut(start=make_time(m=30, s=30), end=make_time(m=40, s=20))
subs.save("subtitles-cut.srt")

...which is a bit ugly/verbose, though pretty unambiguous and robust for scripted use.

I imagine you may have multiple segments to cut out, in which case it would be nice to be able to specify them all at once, so that all times have the same reference (otherwise you may have to compensate for time shift from previous cuts):

subs.cut([[make_time(m=1, s=30), make_time(m=2, s=0)],
          [make_time(m=15, s=45), make_time(m=16, s=10)]])

Finally, for quick-and-dirty use, this would be a nice addition to the commandline interface, eg.:

$ pysubs2 --cut 30m30s 40m20s subtitles.srt >subtitles-cut.srt

Or perhaps even the more usual (though slightly more ambiguous):

$ pysubs2 --cut 0:30:30 0:40:20 subtitles.srt >subtitles-cut.srt

tkarabela avatar Oct 29 '20 15:10 tkarabela