PySceneDetect icon indicating copy to clipboard operation
PySceneDetect copied to clipboard

Color Histogram Detector

Open wjs018 opened this issue 3 years ago • 0 comments

As discussed in #53 by @r1b, I have taken inspiration from his notebook as well as this paper (pdf) and implemented a color histogram based detector. Command syntax is as follows:

    Command: 
        detect-hist
    
    Arguments:
        --threshold, -t
            Threshold (float) that must be exceeded to trigger a cut.
        
        --bits, -b
            The number of most significant bits to keep when quantizing the color
            channels of each frame.
        
        --min-scene-len, -m
            Same as other detectors

For some detail on the detection algorithm, it works in a couple steps:

  1. The frame of the video is separated by color channel and each color channel is quantized. The quantization is done by only retaining the --bits/-b most significant bits of each pixel. The purpose of this step is to reduce the size of the histogram we will be generating in a future step to make it a bit more tractable.
  2. After quantizing each color channel, we want to combine the bits for each pixel of each color channel into one array. For example, with --bits 2, the resulting array would have elements with bit values of 0bRRGGBB where RR are the two most significant bits from the red channel, GG from the green channel, and BB from the blue channel. With --bits 4 this becomes 0bRRRRGGGGBBBB and so on. This is done by performing bitwise shift operations on the different color channels before joining them using bitwise or operations.
  3. With this one array, a histogram is calculated. The number of bins is given by the possible number of colors after all the quantization. So, for --bits 4 that would be 212 (12 comes from the 4 bits for each of the three color channels). Using --bits 2 would be 26 bins.
  4. This histogram is subtracted element-wise from the previous frame's histogram, and the resulting array has all its elements' absolute values summed to find the total difference. This is the value that is checked to trigger cuts and is what is recorded in the stats file as hist_diff.

There are a couple things that I should mention with the current implementation. First and foremost is that the input frames need to be an 8-bit color image. This means that grayscale images that don't have the three color channels will not work. Also, inputs that have a bit-depth greater than 8 bits such as image sequences of 16-bit images will not work. I have included some checks to make sure the input is of the right shape and dtype.

The threshold is very sensitive to changes in the analysis parameters. Changing options like the downscale factor or the --bits value will have a large impact on what a good threshold would be. Just something to note as the other detectors are not as sensitive to these kind of changes. For defaults, I have chosen values that work well on the goldeneye clip with default downscaling.

Computationally, this detector is not as efficient as others. I have done some testing on my machine and included below the average fps of the detection for different detection algorithms with both default downscaling and -d 1.

detector fps with -d default fps with -d 1 Other Notes
detect-hist 637.78 49.31 -b 4
detect-hist 1450.72 53.43 -b 2
detect-content 1422.79 300.61
detect-adaptive 1480.52 293.96
detect-threshold 1425.77 1228.24

If this is something worth cleaning up/improving, I can work on tests and docs.

wjs018 avatar Oct 27 '22 06:10 wjs018