PySceneDetect icon indicating copy to clipboard operation
PySceneDetect copied to clipboard

Light Flash/Strobe Suppression

Open annishaa88 opened this issue 7 years ago • 20 comments

There are some videos that have camera flashes in them

for example: http://assetsprod2-a.akamaihd.net/tag_reuters_com_2017_newsml_ov6kua1nj_reuters_ingest/tag_reuters_com_2017_newsml_ov6kua1nj_reuters_ingest_LOWRES.mp4 this video gives me the following result: ['00:00:09.766', '00:01:23.266', '00:01:35.066']

Can I do something about it?

annishaa88 avatar Jun 13 '17 14:06 annishaa88

Hello @annishaa88;

Which detector algorithm are you using in this case, threshold, or content? Currently there is only support to ignore subsequent camera flashes within a certain window (using the minimum-scene-length argument), but the initial flash in the window will still be detected due to the design of the detection algorithms.

That being said, I can see this being a relatively common issue, so after the following release of PySceneDetect (v0.5, where the focus is major changes to the Python API), I will look into adding support for ignoring camera flash in certain videos. There are a few ideas I have that can solve this issue, but they will require some modification to the existing detection algorithms (as well as some additional command-line parameters), but are definitely possible, and should take care of almost all instances of flicker/camera flash.

If you have any suggestions regarding how the implementation should be, by all means, your comments would be most welcome. I will keep you posted as to my progress in this regard, and should hopefully have something for you to test after the next major release. Lastly, thank you very much for providing a sample video - this will be quite handy when the time for testing finally rolls around.

Breakthrough avatar Sep 06 '17 18:09 Breakthrough

My apologies for the lack of update in regards to progress. Unfortunately my development efforts have been focused on the release of the new v0.5 API/CLI, and not so much on enhancements/features. I hope to have the new version released by the end of the month, at which time I can attempt to tackle this issue.

Just to confirm again @annishaa88, what command were you using to generate the results? The specific detection algorithm/thresholds being used would be very useful information.

Breakthrough avatar Jul 07 '18 23:07 Breakthrough

Also one idea I just had to solve this is to add the ability to specify an intensity value which if the frame exceeds it, scene detection will be disabled. This might be rather easy to implement, so I will look into squeezing this in for the upcoming release.

Breakthrough avatar Jul 11 '18 02:07 Breakthrough

Hi Breakthrough,

I have an example video that I have analyzed with bright flashes that cause erroneous scene breaks being detected. The music video for Growl by EXO is done in one continuous shot, but there are strobe lights that flash in the background. I analyzed the video using the content aware detector using the settings of threshold=30 and min_scene_len=10 and ended up with a total of 83 scenes being detected (link). I would expect at least a couple due to transitions to and from title cards, but the strobes account for the vast majority.

On a side note, I have updated to the newest version, and have been liking it so far. The new API has been working great.

wjs018 avatar Sep 02 '18 19:09 wjs018

Hi @wjs018;

Thank you very much for the extensive example - very well put together as well, might I add. Also thanks for your comments regarding the API, means a lot to me - if you have any improvements you want to suggest, feel free to bring them forwards.

I read through some of your work, and agree that edge detection is definitely a viable solution to the strobing issue. I'm looking into how I can create a new EdgeDetector class to detect scenes purely using edge detection, or possibly a more robust detector (RobustDetector?) that combines all features of the detectors (including slow fades and what not).

Also left a few suggestions for performance in one of your repos for how you might be able to improve your runtime using PySceneDetect - sorry the documentation is still under works, I need to add more examples of different usage styles, the current api_test.py is geared towards multiple calls to the function from starting/stopping the program entirely.

Breakthrough avatar Sep 02 '18 21:09 Breakthrough

@Breakthrough I have been doing some work on this problem recently and implemented a working example of an EdgeDetector (code here). I tried to style it after the existing detectors in PySceneDetect and it seems to be plug and play with my existing programs. I basically took the same approach that skvideo does in their scenedet function (docs) (github). This function is just an implementation of this paper (pdf). It adds a dependency on skvideo for their motion estimation code, and a dependency on scipy for some binary image morphology operations.

Some notes about edge detection:

  • I tried a bunch of different methods to automatically generate the low and high threshold for the Canny edge filter, but none of them performed as well as the method used by skvideo, so I ended up using the same method.
  • The motion estimation parameter r_dist is the radius (in pixels) over which the detector will look for motion in the frame. However, this is pixels in the scaled down image if downscale_factor is not 1 in the VideoManager object.
  • The performance is much slower than the other detectors. For some numbers to compare, I get ~20 fps detecting scenes in a 1080p video with ContentDetector with no downscaling. Using EdgeDetector, I need to downscale by 4x in order to match that 20 fps.

Results:

  • Music video for Growl by EXO (one continuous shot) analyzed by ContentDetector:
    • video
    • 62 detected scenes (threshold=30, min_scene_len=10)
  • Same video analyzed by EdgeDetector:
    • video
    • 10 detected scenes (threshold=0.4, min_scene_len=10, r_dist=6)

Overall, I am happy with it for my purposes, but different videos are going to require parameter tuning to get better accuracy. There are certain videos for which I have found the ContentDetector seems to perform better, while others perform better with the EdgeDetector. I am going to experiment a bit with using both in combination. Perhaps to do something akin to the RobustDetector you mentioned, would it make sense to add the ability to not add anything to a cut list unless all detectors added detect a cut for that frame? Currently, I believe a cut is added to the list if any of the detectors is triggered (which works great in some cases).

wjs018 avatar Nov 24 '18 05:11 wjs018

Hey, is there any progress on dealing with camera flash, or does anyone know any libraries that are able to deal with this?

dave-epstein avatar May 04 '19 03:05 dave-epstein

Hey @dave-epstein;

Sorry, no progress yet on that front, I'd like to start cleaning up the backlog before addressing any new features at the present moment in time... My apologies, haven't had much time to keep up with the project lately.

I definitely do want to integrate this with PySceneDetect though. In the meantime, any pull requests are still most welcome.

Thank you.

Breakthrough avatar May 20 '19 00:05 Breakthrough

Interestingly, it appears that a pretty novel solution using a lookahead buffer was implemented in rav1e AVC encoder (which itself was based on the detect-content algorithm!): https://github.com/xiph/rav1e/blob/master/src/scenechange/mod.rs

This indicates that an underlying design change will be required to support frame lookahead, but this seems like a viable (and awesome!) approach I never originally considered. Will definitely be looking more into how this can be integrated into PySceneDetect to allow for adding flash suppression to detect-content.

Breakthrough avatar Jun 14 '20 16:06 Breakthrough

I'll take a bite at this one for the v0.6.x release, as it's a really nice to have feature. I think I managed to come up with a method that doesn't require a lookahead buffer, and has minimal impact on performance. It doesn't use edge detection, but again the same method as rav1e just modified to use a state machine rather than frame lookahead.

Edit: I don't want to ignore your other method either, @wjs018 - would be awesome if that could be either integrated with ContentDetector or shipped as part of PySceneDetect as another detection method. Just think I have a way to solve the most pressing use cases with minimal impact on performance (and a more "tunable" max # frames per flash setting).

Breakthrough avatar Jul 01 '20 22:07 Breakthrough

I think my method may not perform as well as yours @wjs018, but will have minimal performance impact. Will likely be using yours as a source of test footage. It's probably worth shipping your edge detector with one of the next releases, as certain users will likely have the same use case as you did.

Breakthrough avatar Jul 19 '20 18:07 Breakthrough

I did a test of this in v0.5.x (link to download .zip), for users wishing to beta test this feature before it's official release. It is turned on by default when running detect-content with a suppression amount of 2 frames.

The suppression amount (called flicker_frames) can be changed by the call to detect-content via the -f / --flicker [N] argument, which specifies the flash suppression amount, in frames, e.g.:

scenedetect -i video.mp4 detect-content -f 3

To turn flash suppression off, set -f to 0. Any feedback is most welcome!

Breakthrough avatar Aug 04 '20 03:08 Breakthrough

Looks like the v0.6.x branch and zip were deleted.

DrSammyD avatar Jul 18 '22 14:07 DrSammyD

Sorry about that, you can find an updated link here with the feature: https://github.com/Breakthrough/PySceneDetect/archive/c46469e2bcceb8b33885a5bc2826c454a0ecba11.zip

I'll try to schedule this in for v0.6.1 or v0.6.2, but leaving it off by default until it has more testing.

Edit: If anyone can share any other examples to use as test cases, that would also be greatly appreciated.

Breakthrough avatar Jul 19 '22 01:07 Breakthrough

I took a look at this again, and think there's a few main cases to deal with:

  1. Exposure and colour controlled
  2. Exposure controlled
  3. Uncontrolled exposure

The video you posted @wjs018 falls under category 1. If you look at the delta in hue channel, it's very low throughout the video. I've been experimenting with combining the EdgeDetector implementation you provided with ContentDetector and have gotten promising results so far with adequate performance.

The idea is you can pass a set of weights for the deltas in hue, saturation, luma, and edges. This would allow both cases 1 and 2 to be dealt with by providing different sets of weights, or always considering only hue/edge information to provide higher confidence. Also need to look more into filtering the edges based on video resolution, but this is all doable.

Case 3 will need a separate approach, something like a low-pass filter on the luma channel. Going forwards I would like to integrate all of this into ContentDetector and provide options to control the mode/channel weights and filtering, rather than separate detectors. This should make it easier to try different combinations.

There definitely needs to be more work done in determining default weights, so the initial release will probably keep them the same as today (i.e. equal weights for HSL and zero weight on edges for now). Essentially, making ContentDetector much more robust by considering edge information and sudden brightness changes, and provides a pathway for improved detection confidence by cross-validating different metrics.

Breakthrough avatar Aug 01 '22 19:08 Breakthrough

What do you think of doing auto tuning the thresholds to standard deviations throughout the video? Performance might be a bit worse. It tends not to catch pan shots combined with flash though. I had the same thought about combining the two with my own version of content aware detector. Not clean code, but the idea is to record diffs over the entire video. Assuming that most frames aren't cuts, a diff > 2 standard deviations works pretty well.

""" Experimental edge_detector module for PySceneDetect.

This module implements the EdgeDetector, which compares the difference
in edges between adjacent frames against a set threshold/score, which if
exceeded, triggers a scene cut.
"""

# Third-Party Library Imports
from os import link
from typing import Iterable, List, Tuple
import enum
from tkinter import W
from cv2 import imshow
import numpy
import cv2

# New dependencies
from skvideo.motion.gme import globalEdgeMotion
from scipy.ndimage.morphology import binary_dilation

# PySceneDetect Library Imports
from scenedetect.scene_detector import SceneDetector


def calculate_frame_score(current_frame_hsv: Iterable[numpy.ndarray],
                          last_frame_hsv: Iterable[numpy.ndarray]) -> Tuple[float]:
    """Calculates score between two adjacent frames in the HSV colourspace. Frames should be
    split, e.g. cv2.split(cv2.cvtColor(frame_data, cv2.COLOR_BGR2HSV)).

    Arguments:
        curr_frame_hsv: Current frame.
        last_frame_hsv: Previous frame.

    Returns:

        Tuple containing the average pixel change for each component as well as the average
        across all components, e.g. (avg_h, avg_s, avg_v, avg_all).

    """
    current_frame_hsv = [x.astype(numpy.int32) for x in current_frame_hsv]
    last_frame_hsv = [x.astype(numpy.int32) for x in last_frame_hsv]
    delta_hsv = [0, 0, 0, 0]
    for i in range(3):
        num_pixels = current_frame_hsv[i].shape[0] * \
            current_frame_hsv[i].shape[1]
        delta_hsv[i] = numpy.sum(
            numpy.abs(current_frame_hsv[i] - last_frame_hsv[i])) / float(num_pixels)

    delta_hsv[3] = sum(delta_hsv[0:3]) / 3.0
    return tuple(delta_hsv)


sigma = 0.33


def unsharp_mask(img, blur_size=(21, 21), imgWeight=1.5, gaussianWeight=-0.5, retries=3):
    if(retries == 0):
        return img
    gaussian = cv2.GaussianBlur(img, (5, 5), 0)
    return unsharp_mask(cv2.addWeighted(img, imgWeight, gaussian, gaussianWeight, 0), blur_size=blur_size, imgWeight=imgWeight, gaussianWeight=gaussianWeight, retries=retries-1)


def compute_frame_transforms(frame: numpy.ndarray) -> float:
    """Computes the edge metrics for a frame."""
    # Convert to grayscale
    _bw = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    _hsv = cv2.split(cv2.cvtColor(frame, cv2.COLOR_BGR2HSV))
    # Some calculation to determine canny thresholds
    _median = numpy.median(_bw)
    _low = int(max(0, (1.0 - sigma) * _median))
    _high = int(min(255, (1.0 + sigma) * _median))
    # Do our Canny edge detection
    img_invert = cv2.bitwise_not(_bw)
    img_smoothing = unsharp_mask(img_invert, (9, 9))
    final = cv2.divide(_bw, 255 - img_smoothing, scale=255)
    final = cv2.threshold(final, _low, _high, cv2.THRESH_BINARY_INV)[1]
    _edges = cv2.Canny(final, _low, _high, apertureSize=3, L2gradient=True)
    # cv2.imshow('final', final)
    # cv2.imshow('edges', _edges)
    # cv2.waitKey(1)
    _contrast = _bw.std()
    return (_edges, _contrast, _bw, _hsv)


class EdgeDetector(SceneDetector):
    """Detects cuts using changes in edges found using the Canny operator.

    This detector uses edge information to detect scene transitions. The
    threshold sets the fraction of detected edge pixels that can change from one
    frame to the next in order to trigger a detected scene break. Images are 
    converted to grayscale in this detector, so color changes won't trigger
    a scene break like with the ContentDetector.

    Paper reference: http://www.cs.cornell.edu/~rdz/Papers/ZMM-MM95.pdf
    """

    def __init__(self, similar=0.75, confirm=1.25, initial=0.3, r_dist=6, buffer_size=9):
        super(EdgeDetector, self).__init__()
        # first pass threshold
        self.initial = initial
        # similarity standard deviations threshold
        self.similar = similar
        # confirm standard deviations threshold
        self.confirm = confirm
        # distance over which motion is estimate (on scaled-down image)
        self.r_dist = r_dist
        self.last_frame = None
        self.last_scene_cut = -3
        self.contrasts = []
        self.frame_scores = []
        self.p_maxes = []
        self.buffer = []
        self.buffer_size = abs(buffer_size)
        self.saved_frames = []
        self.p_contrast_median = 0.0
        self.p_max_median = 0.0
        self.p_hue_median = 0.0
        self.p_sat_median = 0.0
        self.p_lum_median = 0.0
        self.p_sum_median = 0.0
        self.p_contrast_std = 0.0
        self.p_max_std = 0.0
        self.p_hue_std = 0.0
        self.p_sat_std = 0.0
        self.p_lum_std = 0.0
        self.p_sum_std = 0.0
        self._metric_keys = ['p_max', 'p_in', 'p_out', 'p_contrast', 'p_delta']
#         self.cli_name = 'detect-content'

    def _percentage_distance(self, frame_in, frame_out, r):
        diamond = numpy.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])

        E_1 = binary_dilation(frame_in, structure=diamond, iterations=r)
        E_2 = binary_dilation(frame_out, structure=diamond, iterations=r)

        combo = numpy.float32(numpy.sum(E_1 & E_2))
        total_1 = numpy.float32(numpy.sum(E_1))

        return 1.0 - combo/total_1

    def _compute_edges_p_max(self, last_edges, curr_edges):
        # Estimate the motion in the frame using skvideo
        r_dist = self.r_dist
        disp = globalEdgeMotion(numpy.array(last_edges, dtype=bool),
                                numpy.array(curr_edges, dtype=bool),
                                r=r_dist,
                                method='hamming')

        # Translate our current frame to line it up with previous frame
        comp_edges = numpy.roll(curr_edges, disp[0], axis=0)
        comp_edges = numpy.roll(comp_edges, disp[1], axis=1)

        # Calculate fraction of edge pixels changing using scipy
        r_iter = 6      # Number of morphological operations performed
        p_in = self._percentage_distance(last_edges, comp_edges, r_iter)
        p_out = self._percentage_distance(comp_edges, last_edges, r_iter)
        p_max = numpy.max((p_in, p_out))

        return p_max, p_in, p_out

    def process_frame(self, frame_num, frame_img):
        # type: (int, numpy.ndarray) -> List[int]
        """ Detects difference in edges between frames. Slow transitions or 
        transitions that happen in color space that won't show in grayscale
        won't trigger this detector.

        Arguments:
            frame_num (int): Frame number of frame that is being passed.

            frame_img (Optional[int]): Decoded frame image (numpy.ndarray) to perform scene
                detection on. Can be None *only* if the self.is_processing_required() method
                (inhereted from the base SceneDetector class) returns True.

        Returns:
            List[int]: List of frames where scene cuts have been detected. There may be 0
            or more frames in the list, and not necessarily the same as frame_num.
        """
        metric_keys = self._metric_keys
        _unused = ''

        # If we're on the first frame, insert dummy values for delta and return
        if(len(self.buffer) < 3):
            curr_edges, curr_contrast, curr_bw, _hsv = compute_frame_transforms(
                frame_img)
            self.buffer.append(
                (frame_num, curr_edges, curr_contrast, curr_bw, _hsv, {}))
            p_contrast_pct = 0.0
            return []

        # Fraction of edge pixels changing in new frame, max, entering, and leaving
        p_max, p_in, p_out = 0.0, 0.0, 0.0

        if (self.stats_manager is not None and
                self.stats_manager.metrics_exist(frame_num, metric_keys)):
            p_max, p_in, p_out, p_contrast, p_delta = self.stats_manager.get_metrics(
                frame_num, metric_keys)

        else:
            # Get last element of buffer
            (last_frame_num, last_edges, last_contrast, last_bw,
                last_hsv, last_diff) = self.buffer[-2]
            # Get current frame transforms
            (curr_edges, curr_contrast, curr_bw,
                curr_hsv) = compute_frame_transforms(frame_img)
            # Compute difference in frames
            p_max, p_in, p_out = self._compute_edges_p_max(
                last_edges, curr_edges)
            p_delta = calculate_frame_score(curr_hsv, last_hsv)
            p_contrast = abs(curr_contrast - last_contrast)
            p_contrast_pct = 1
            if (curr_contrast > 0 and last_contrast > 0):
                p_contrast_pct = p_contrast/(numpy.min([curr_contrast, last_contrast]) /
                                             numpy.max([curr_contrast, last_contrast]))
            # record metrics
            if self.stats_manager is not None:
                self.stats_manager.set_metrics(frame_num, {
                    metric_keys[0]: p_max,
                    metric_keys[1]: p_in,
                    metric_keys[2]: p_out,
                    metric_keys[4]: p_delta,
                    metric_keys[3]: p_contrast
                })
            # save metrics for standard deviation calculations
            self.contrasts.append(p_contrast)
            self.frame_scores.append(p_delta)
            self.p_maxes.append(p_max)
            # save diffs between frames to avoid recalculating
            curr_diff = {}
            last_diff[last_frame_num] = curr_diff[frame_num] = (
                p_max, p_delta, p_contrast)
            self.buffer.append(
                (frame_num, curr_edges, curr_contrast, curr_bw,  curr_hsv, curr_diff))

        # cv2.imshow("curr_bw", frame_img)
        # cv2.imshow("last_bw", curr_edges)
        # cv2.waitKey(1)
        # if threshold is met mark for cut calculation
        if p_max >= self.initial or p_contrast_pct >= self.initial:
            # get last saved frame if there is one
            last_buffer = None
            if len(self.saved_frames) > 0:
                last_saved_frame = self.saved_frames[-1]
                last_buffer = last_saved_frame[1]
            # if buffer is not the same as the last buffer we have our first potential cut in this buffer
            # save the buffer
            if last_buffer is not self.buffer:
                self.saved_frames.append(
                    (frame_num, self.buffer))
            # if buffer is the same as the last buffer, add this potential cut to the buffer and extend the life of this buffer
            else:
                # update saved frames with new frame_num and shift buffer
                self.buffer = self.buffer[:]
                self.saved_frames[-1] = (frame_num, self.buffer)

        # if the buffer size is reached and we're not within the buffer size of the last saved frame, drop frames from the buffer
        if len(self.saved_frames) == 0 or frame_num > self.saved_frames[-1][0] + self.buffer_size:
            self.buffer = self.buffer[-self.buffer_size:]

        return []

        def get_or_create_diff_std(self, buffer_element, target_buffer_element):
        (frame_num, _edges, _contrast, _bw, _hsv, diffs) = buffer_element
        (frame_num_t, _edges_t, _contrast_t, _bw_t,
         _hsv_t, diffs_t) = target_buffer_element
        linked_diff = diffs_t.get(frame_num, None)
        if linked_diff is None:
            p_max, _p_in, _p_out = self._compute_edges_p_max(_edges_t, _edges)
            p_delta = calculate_frame_score(_hsv_t, _hsv)
            p_contrast = abs(_contrast-_contrast_t)
            linked_diff = (p_max, p_delta, p_contrast)
            diffs[frame_num_t] = linked_diff
            diffs_t[frame_num] = linked_diff
        (p_max, p_delta, p_contrast) = linked_diff
        (p_hue, p_sat, p_lum, p_sum) = p_delta
        (p_hue_std, p_sat_std, p_lum_std, p_sum_std) = (
            abs(p_hue-self.p_hue_median)/self.p_hue_std,
            abs(p_sat-self.p_sat_median)/self.p_sat_std,
            abs(p_lum-self.p_lum_median)/self.p_lum_std,
            abs(p_sum-self.p_sum_median)/self.p_sum_std
        )
        p_max_std = abs(p_max-self.p_max_median)/self.p_max_std
        p_contrast_std = abs(
            p_contrast-self.p_contrast_median) / self.p_contrast_std
        return (p_max_std, p_contrast_std, p_hue_std, p_sat_std, p_lum_std, p_sum_std)

    def confirm_cut(self, diff):
        (p_max_std, p_contrast_std, p_hue_std,
         p_sat_std, p_lum_std, p_sum_std) = diff
        count = 0
        if p_lum_std > self.confirm:
            count += 2/3
        if p_hue_std > self.confirm:
            count += 1/3
        if p_hue_std > self.confirm:
            count += 1/3
        if p_max_std > self.confirm:
            count += 1
        if p_contrast_std > self.confirm:
            count += 1

        if count >= 2:
            return True

    def confirm_similar(self, diff):
        (p_max_std, p_contrast_std, p_hue_std,
         p_sat_std, p_lum_std, p_sum_std) = diff
        count = 0
        if p_lum_std < self.similar:
            count += 2/3
        if p_hue_std < self.similar:
            count += 1/3
        if p_hue_std < self.similar:
            count += 1/3
        if p_max_std < self.similar:
            count += 1
        if p_contrast_std < self.similar:
            count += 1

        if count >= 2:
            return True

    def get_weighted_delta(self, diff_std):
        p_max_std, p_contrast_std, p_hue_std, p_sat_std, p_lum_std, p_sum_std = diff_std
        weights = [3, 3, 1, 1, 1, 1]
        distributions = [p_max_std, p_contrast_std,
                         p_hue_std, p_sat_std, p_lum_std, p_sum_std]
        weighted_sum = []
        for std, weight in zip(distributions, weights):
            weighted_sum.append(std*weight)
        return numpy.sum(weighted_sum)/numpy.sum(weights)

    def post_process(self, frame_num):
        cut_list = []
        (p_hues, p_sats, p_lums, p_sums) = zip(*self.frame_scores)
        self.p_hue_median = numpy.nanmedian(p_hues)
        self.p_sat_median = numpy.nanmedian(p_sats)
        self.p_lum_median = numpy.nanmedian(p_lums)
        self.p_sum_median = numpy.nanmedian(p_sums)
        self.p_hue_std = numpy.nanstd(p_hues)
        self.p_sat_std = numpy.nanstd(p_sats)
        self.p_lum_std = numpy.nanstd(p_lums)
        self.p_sum_std = numpy.nanstd(p_sums)
        self.p_contrast_median = numpy.nanmedian(self.contrasts)
        self.p_max_median = numpy.nanmedian(self.p_maxes)
        self.p_contrast_std = numpy.nanstd(self.contrasts)
        self.p_max_std = numpy.nanstd(self.p_maxes)
        stats_manager = self.stats_manager
        frame_updates = []
        delta_rate = []
        for saved_frames in self.saved_frames:
            _frame_num, buffer = saved_frames
            start_buffer = buffer[0]
            end_buffer = buffer[-1]
            first_frame_index = 0
            second_frame_index = 1
            while second_frame_index < len(buffer):
                diff = self.get_or_create_diff_std(
                    buffer[first_frame_index], buffer[second_frame_index])
                delta = self.get_weighted_delta(diff)
                delta_rate.append((buffer[second_frame_index][0], delta))
                if self.confirm_cut(diff):
                    frame_updates.append(
                        buffer[second_frame_index][0])
                first_frame_index += 1
                second_frame_index += 1

            if len(frame_updates) == 0:
                continue

            last_frame_update = frame_updates[0]
            grouped_frame_updates = [[last_frame_update]]
            for frame_update in frame_updates[1:]:
                if frame_update > last_frame_update + self.buffer_size:
                    # Initialize a new group
                    grouped_frame_updates.append([frame_update])
                else:
                    grouped_frame_updates[-1].append(frame_update)
                last_frame_update = frame_update

            cut_list += self.get_cuts_from_buffer_group(
                grouped_frame_updates, buffer)
        # skip first cut because it will be tied to start delta
        return cut_list[1:]

    def get_cuts_from_buffer_group(self, grouped_frame_updates, buffer):
        cut_list = []
        for frame_updates in grouped_frame_updates:
            first_frame_index = 0
            last_frame_index = len(frame_updates) - 1
            # find index of first frame in buffer
            for frame_index in range(len(buffer)):
                if buffer[frame_index][0] == frame_updates[0]:
                    first_frame_index = frame_index
                if buffer[frame_index][0] == frame_updates[-1]:
                    last_frame_index = frame_index
            second_frame_index = 0
            first_frame_index -= 1
            ##cv2.imshow("curr_bw", buffer[first_frame_index][3])
            ##cv2.imshow("last_bw", buffer[last_frame_index][3])
            # #cv2.waitKey(1000)
            second_frame_index = 0
            # for all frames between first and last frame_updates
            frame_updates_passed = []
            while second_frame_index < len(buffer)-1:
                if 0 not in [len(frame_updates_passed), len(frame_updates)] and frame_updates[0] - frame_updates_passed[-1] <= 3:
                    frame_updates = frame_updates[1:]
                second_frame_index += 1
                if(buffer[second_frame_index][0] not in frame_updates):
                    continue
                first_frame_index = second_frame_index - 1
                second_frame_index += 1

                found_similar = False
                similarity = 0

                start_second_index = second_frame_index
                while buffer[first_frame_index][0] > frame_updates[0]-3 and first_frame_index >= 0:
                    while not found_similar and second_frame_index < len(buffer) and (second_frame_index - start_second_index) < 4:
                        if self.confirm_similar(
                            self.get_or_create_diff_std(
                                buffer[first_frame_index], buffer[second_frame_index]
                            )
                        ):
                            found_similar = True
                            break
                        second_frame_index += 1
                    if found_similar:
                        break
                    first_frame_index -= 1
                    second_frame_index = start_second_index
                if found_similar:
                    # filter out the frame_updates that between the start_index and forward_index
                    frame_updates_to_remove = []
                    for frame_update in frame_updates:
                        if frame_update >= buffer[first_frame_index][0] and frame_update <= buffer[second_frame_index][0]:
                            frame_updates_to_remove.append(frame_update)
                    # remove the frame_updates in frame_updates_to_remove
                    for frame_update in frame_updates_to_remove:
                        frame_updates.remove(frame_update)
                else:
                    frame_updates_passed += [frame_updates[0]]
                    frame_updates = frame_updates[1:]
                second_frame_index = 0
            # add the first and last frame_updates not removed to the cut_list
            if len(frame_updates_passed) > 0:
                step = len(frame_updates_passed) - 1
                if step == 0:
                    step = 1
                cut_list += frame_updates_passed[::step]
        return cut_list

DrSammyD avatar Aug 08 '22 20:08 DrSammyD

@DrSammyD that should probably be a separate issue/feature request. Ideally I'd like to try and use some kind of online algorithm for estimating the threshold (with a reasonable buffer size), but using a sliding window might be sufficient for that purpose. That being said, it's a very good idea and should definitely be pursued in the right forum. Feel free to file a new bug report or create a discussion for that.

Breakthrough avatar Aug 27 '22 00:08 Breakthrough

Does anyone have any links to videos they can share exhibiting this behavior? Youtube links are fine. I would like to start compiling a list of test cases to use for validating this, or at least better categorize the classes of issues that need to be solved for this.

I want to add back the strobe suppression by calculating the delta between the frame before the strobe/flash event, and up to N frames after. This means adding the following options:

  • strobe length: max # of frames to look ahead after a strobe is detected
  • strobe weights
  • strobe threshold

However, I wonder if a simpler approach could be achieved by some kind of filter that rejects cuts if there is a sudden increase in average frame luma values (or it deviates based on a rolling average). Having a good library of test cases is crucial for implementing this, so hoping folks can provide some examples.

Ideally we could then create a single video with a bunch of different forms of flashes/strobes to help validate this.

To start things off, one interesting sequence is in the movie Kick-Ass (2010): https://www.youtube.com/watch?v=-SbnqIIkXQc

This has several different types of flashes/strobes throughout, and seems to be quite a challenging case.

Breakthrough avatar Aug 27 '22 00:08 Breakthrough

I've made progress on this, and there will be a new flash filter that can be enabled/disabled in v0.7. When a rapid set of cuts is detected (i.e. several consecutive scenes less than the min-scene-len option), subsequent cuts will be suppressed until min-scene-len frames pass without a cut.

This effectively merges consecutive sequences of scenes shorter than min-scene-len, grouping the areas where flashing occurs into a contiguous scene. On the above video with min scene length as the default (0.6 seconds):

Detector # Scenes # Scenes w/ Filter
detect-content 123 74
detect-adaptive 98* 89**
  • [X] * After implementation, # of scenes changed from 104 to 98. still investigating differences.
    • This was an actual bug in detect-adaptive not checking the minimum scene length correctly, updated result above.
  • [x] ** This result was unexpected, I thought this would have been less than detect-content. This might be due to conflicting approaches in flash suppression.
  • [ ] In some cases detect-adaptive performs worse with the filter, suggesting this might not be a good approach given it already has a similar filter in place. Investigate if there is a way to convert AdaptiveDetector into a filter as well, which would alleviate a lot of the issues with duplicated config options between AdaptiveDetector and ContentDetector

Here's an example of how the flash grouping looks in action:

https://github.com/Breakthrough/PySceneDetect/assets/125316/28d4a036-6b57-4a04-a152-4fecd783208e

Without the filter, detect-content emits 10 different clips for this segment, and misses a cut that occurs just after the above video ends. Without the filter, detect-adaptive performs better, emitting only 8 clips for this segment, and doesn't miss the subsequent cut. Just looking at the outputted thumbnails, there are lots of similar/duplicate images without the filter, and almost none with it. Flashes can be reduced further by adjusting min-scene-len accordingly with the filter enabled.

This doesn't affect any material without flashes, so I will enable this by default in the next release for both the API and the command line program. This won't resolve the flashing issue entirely, but will greatly reduce the impact it has on the output when it does occur.

Breakthrough avatar Apr 20 '24 02:04 Breakthrough