PySceneDetect icon indicating copy to clipboard operation
PySceneDetect copied to clipboard

Introduce smart `--max-segment-length` parameter to evenly split larger scenes

Open IncredibleLaser opened this issue 1 year ago • 2 comments

Problem/Use Case

I'm using PySceneDetect to encode videos with quite large GOP size (let's say 30 seconds, this isn't a good setting for some cases but works pretty well for my home use). The issue or possible improvement comes up with scenes that are slightly longer than the GOP size: let's say a scene is 122 seconds long. The resulting segment will exhibit the downsides of the large GOP size (slow seeking for the first 120 seconds, at least if no hardware decoding is available) and yet the scene has five key frames, with the last GOP being only two seconds long. The issue is the same regardless of TIMECODE unit.

Solutions

With a setting of a maximum scene length (or another name since it's still the same scene), the scene could be split up even further, as in for the example I described, PySceneDetect could determine that five segments are needed and split the scene evenly into clips with a duration of 24.4 seconds (or whatever that would be in frames).

Proposed Implementation:

Introduce the parameter --max-segment-length to set the maximum size of a segment. Scenes surpassing the maximum segment length are split as evenly as frame numbers and the number of segments allow into the required number of segments.

Alternatives:

No idea.

Examples:

None

IncredibleLaser avatar Oct 14 '24 17:10 IncredibleLaser

Are you encoding the videos via the split-video command or via some other means? It should be possible to post-process the scene list to look for scenes that don't meet this constraint.

Is there no options that can be specified to ffmpeg to avoid slow seeking in these cases? And what do you mean by PySceneDetect could determine five segments? Sorry I am not that familiar with these encoder params.

Breakthrough avatar Oct 16 '24 01:10 Breakthrough

Are you encoding the videos via the split-video command or via some other means?

Via split-video.

Is there no options that can be specified to ffmpeg to avoid slow seeking in these cases?

One could make the GOP smaller, but this reduces the efficiency of the encoded video. A GOP normally has one I-frame at the beginning, which can be decoded independently, all other frames (B- and P-Frames) in the GOP rely on that I-Frame, either directly or indirectly. Because of how a video works, it makes sense to set a new I-Frame at a scene change (hence why I'm posting it here), as the bigger the change between frames, the smaller the efficiency of B- and P-Frames.

I-Frames are much larger than other frames as they need to contain the full picture (like a JPEG file), so for that reason, you'd want as few as possible. The downside comes in when you want to seek to somewhere where the last I-Frame is far away, as your player needs to go back to the I-Frame and decode all following frames until it arrives at the desired seek position.

So what I was trying to explain is that there could be such a parameter that PySceneDetect could make these splits across scenes evenly, as the underlying video codec either can't detect a scene change itself or if it can, it usually doesn't analyze the video as a whole before processing it, which means it would run into the issue I described initially: it would create four GOPs at 30 seconds, start the fifth one, process it for two seconds, detect that the scene is over and then start a new GOP.

And what do you mean by PySceneDetect could determine five segments? Sorry I am not that familiar with these encoder params.

The "detection" part would be that PyScene detects a scene of length 122 seconds and would calculate with user-supplied parameter that the scene would need to be split into five segments so that no segment exceeds exceeds the maximum segment length.

Sorry if this is explained in a convoluted way, English isn't my first language.

IncredibleLaser avatar Oct 18 '24 21:10 IncredibleLaser

Based on your input, I have come to the conclusion that the functionality doesn't make too much sense in PySceneDetect itself.

The reason is that GOPs would have to be closed if the functionality was to be implemented here. It's better to write some simple logic to check if the current group is larger than the maximum GOP size and if yes, recalculate it.

IncredibleLaser avatar Oct 30 '24 13:10 IncredibleLaser