PySceneDetect Two-Phase Command Architecture

Two-Phase Command Architecture

Open Breakthrough opened this issue 1 year ago • 1 comments

Problem/Use Case

With the addition of more detectors and filters, it would be ideal to improve algorithm reuse and interoperability. As identified in #402, it should be possible to remove AdaptiveDetector and flash suppression filter options by allowing users to specify two commands when detecting scenes: the scoring phase (how to calculate the difference between frames indicating how "different" it is), and the trigger phase (how we decide from the score that the next frame is a new scene).

Solutions

Add a new type of command called filter- which can be used as follows. First, what is equivalent to today's default with detect-content becomes:

 scenedetect -i video.mp4 detect-content filter-flash

detect-adaptive will also be replaced with a filter called filter-adapt which must be combined with another fast-cut detector. The default for that becomes:

 scenedetect -i video.mp4 detect-content filter-adapt

Proposed Implementation:

Remove:

detect-adaptive command
--filter-mode option from detect-content

Add:

filter-adapt command to perform adaptive filtering on whatever fast cut detector is specified (e.g. should work with both detect-content and detect-histogram)
filter-flash command to perform --filter-mode=suppress with whatever fast cut detector is specified

Default values for filters might need to be tuned depending on what detector is being used, but this is a tractable problem.

Open Questions

What API changes are required to support this?

Right now detectors provide locations of cuts and not scores directly, making filtering more difficult. In v0.6.4 a new filter type was added which can be integrated with detectors individually, but this is not scalable. It can be used to ship something for the CLI earlier, while working on how the API can reflect this change.

Today detectors produce frame numbers where cuts are found. Instead, they should produce a type (fast cut, fade) and score for each frame from 0.0-1.0 which indicates the confidence that the given frame is a cut. Filters could then operate on the result.

TODO: Make API examples.

Jul 22 '24 02:07 Breakthrough

Perhaps this should be an API only change and not affect the CLI.

API Sketch

Ideally we could have a concept of data sources (detectors) and filters (what scene manager accomplishes today). The result of filter application would be a set of events, e.g.:

I'll try to get a PR up for this eventually that demonstrates it better.

from enum import Enum, auto
from scenedetect import FrameTimecode
import typing as ty

import numpy as np


class Source:
    pass

##
## Sources
##

class Similarity(Source):

    # Similarity of current frame from previous. Normalized between 0.0 and 1.0.
    @property
    def amount(self) -> float:
        pass

    # Confidence of measurement.
    @property
    def confidence(self) -> ty.Optional[float]:
        return None

class Foreground(Source):

    # Map of foreground and background pixels in source image.
    #
    # Should be usable as a mask by setting foreground to 255 and background to 0.
    @property
    def mask(self) -> np.ndarray:
        pass

class Brightness(Source):
    # Estimated brightness for the frame normalized from 0.0 to 1.0.
    @property
    def amount(self) -> float:
        pass


##
## Events
##

@Enum
class EventType:
    MOTION_START = auto()
    MOTION_END = auto()
    FADE_IN = auto()
    FADE_OUT = auto()
    CUT = auto()
    DISSOLVE = auto()


class Event:

    @property
    def type(self) -> EventType:
        pass

    @property
    def timecode(self) -> FrameTimecode:
        pass

##
## Filters
##


class Filter:
    pass


class Motion(Filter):
    def filter(fg: Foreground) -> Event:
        pass

    def post_process() -> ty.Iterable[Event]:
        pass

class Cuts(Filter):
    def filter(similarity: Similarity) -> Event:
        pass

    def post_process() -> ty.Iterable[Event]:
        pass


class Fades(Filter):
    def filter(brightness: Brightness) -> Event:
        pass

    def post_process() -> ty.Iterable[Event]:
        pass


##
## Workflow Result
##

class Result:


    @property
    def events(self) -> ty.Iterable[Event]:
        pass

    def to_scenes(self) -> ty.Iterable[FrameTimecode]:
        pass


##
## Dispatcher
##

class Dispatcher:

    def __init__(self, pipelines: ty.Iterable[ty.Tuple[Source, Filter]]):
        self._pipelines = pipelines
        pass


    def run(self, video) -> Result:

        events = []
        for frame in video:
            for (source, filter) in self._pipelines:
                events += filter(source)


###
### Stubs
###

class HSL(Similarity):
    pass




##
## Usage
##

from scenedetect import open_video, split_video_ffmpeg

video = open_video("test.mp4")

dispatcher = Dispatcher((HSL(), Cuts()))
result = dispatcher.run(video)

# Helper functions for commonly used combinations:

def detect_shot_boundaries(
    video: VideoStream,
    methods: ty.Iterable[ty.Tuple[Source, Filter]],
    ...)

Sep 16 '24 01:09 Breakthrough

PySceneDetect PySceneDetect copied to clipboard

Two-Phase Command Architecture

PySceneDetect
PySceneDetect copied to clipboard