PySceneDetect
PySceneDetect copied to clipboard
Two-Phase Command Architecture
Problem/Use Case
With the addition of more detectors and filters, it would be ideal to improve algorithm reuse and interoperability. As identified in #402, it should be possible to remove AdaptiveDetector and flash suppression filter options by allowing users to specify two commands when detecting scenes: the scoring phase (how to calculate the difference between frames indicating how "different" it is), and the trigger phase (how we decide from the score that the next frame is a new scene).
Solutions
Add a new type of command called filter- which can be used as follows. First, what is equivalent to today's default with detect-content becomes:
scenedetect -i video.mp4 detect-content filter-flash
detect-adaptive will also be replaced with a filter called filter-adapt which must be combined with another fast-cut detector. The default for that becomes:
scenedetect -i video.mp4 detect-content filter-adapt
Proposed Implementation:
Remove:
detect-adaptivecommand--filter-modeoption fromdetect-content
Add:
filter-adaptcommand to perform adaptive filtering on whatever fast cut detector is specified (e.g. should work with bothdetect-contentanddetect-histogram)filter-flashcommand to perform--filter-mode=suppresswith whatever fast cut detector is specified
Default values for filters might need to be tuned depending on what detector is being used, but this is a tractable problem.
Open Questions
What API changes are required to support this?
Right now detectors provide locations of cuts and not scores directly, making filtering more difficult. In v0.6.4 a new filter type was added which can be integrated with detectors individually, but this is not scalable. It can be used to ship something for the CLI earlier, while working on how the API can reflect this change.
Today detectors produce frame numbers where cuts are found. Instead, they should produce a type (fast cut, fade) and score for each frame from 0.0-1.0 which indicates the confidence that the given frame is a cut. Filters could then operate on the result.
TODO: Make API examples.
Perhaps this should be an API only change and not affect the CLI.
API Sketch
Ideally we could have a concept of data sources (detectors) and filters (what scene manager accomplishes today). The result of filter application would be a set of events, e.g.:
I'll try to get a PR up for this eventually that demonstrates it better.
from enum import Enum, auto
from scenedetect import FrameTimecode
import typing as ty
import numpy as np
class Source:
pass
##
## Sources
##
class Similarity(Source):
# Similarity of current frame from previous. Normalized between 0.0 and 1.0.
@property
def amount(self) -> float:
pass
# Confidence of measurement.
@property
def confidence(self) -> ty.Optional[float]:
return None
class Foreground(Source):
# Map of foreground and background pixels in source image.
#
# Should be usable as a mask by setting foreground to 255 and background to 0.
@property
def mask(self) -> np.ndarray:
pass
class Brightness(Source):
# Estimated brightness for the frame normalized from 0.0 to 1.0.
@property
def amount(self) -> float:
pass
##
## Events
##
@Enum
class EventType:
MOTION_START = auto()
MOTION_END = auto()
FADE_IN = auto()
FADE_OUT = auto()
CUT = auto()
DISSOLVE = auto()
class Event:
@property
def type(self) -> EventType:
pass
@property
def timecode(self) -> FrameTimecode:
pass
##
## Filters
##
class Filter:
pass
class Motion(Filter):
def filter(fg: Foreground) -> Event:
pass
def post_process() -> ty.Iterable[Event]:
pass
class Cuts(Filter):
def filter(similarity: Similarity) -> Event:
pass
def post_process() -> ty.Iterable[Event]:
pass
class Fades(Filter):
def filter(brightness: Brightness) -> Event:
pass
def post_process() -> ty.Iterable[Event]:
pass
##
## Workflow Result
##
class Result:
@property
def events(self) -> ty.Iterable[Event]:
pass
def to_scenes(self) -> ty.Iterable[FrameTimecode]:
pass
##
## Dispatcher
##
class Dispatcher:
def __init__(self, pipelines: ty.Iterable[ty.Tuple[Source, Filter]]):
self._pipelines = pipelines
pass
def run(self, video) -> Result:
events = []
for frame in video:
for (source, filter) in self._pipelines:
events += filter(source)
###
### Stubs
###
class HSL(Similarity):
pass
##
## Usage
##
from scenedetect import open_video, split_video_ffmpeg
video = open_video("test.mp4")
dispatcher = Dispatcher((HSL(), Cuts()))
result = dispatcher.run(video)
# Helper functions for commonly used combinations:
def detect_shot_boundaries(
video: VideoStream,
methods: ty.Iterable[ty.Tuple[Source, Filter]],
...)