PySceneDetect
PySceneDetect copied to clipboard
v1.0 Planned API Changes & Feedback
v1.0 Upcoming Changes & Migration Plan
The next major release of PySceneDetect will introduce major breaking changes. Furthermore, the minimum supported version of Python will be increased to 3.5, and OpenCV 2.x support will be deprecated. Support for the v0.5.x branch will still be provided with occasional bugfixes if required, however all new development will proceed on v1.0 once it is released.
These API changes are intended to support further development, simplify integration, and cover many more use cases by emphasizing modularity. Targeting a newer version of Python also can simplify parts of the existing codebase. Most proposed changes are to the internal PySceneDetect API (i.e. there will only be minor modification of those that show up at a high level in the quickstart example). Most expected breaking changes will occur in the SceneDetector
base class and the SceneManager
class and the high level usages thereof.
Users of the high level API should have a relatively smooth transition; developers using the internal APIs to implement detection algorithms will require more significant change, although a migration guide will be provided for both use cases.
All users of PySceneDetect, especially those who use the Python API, are encouraged to provide feedback on all items listed below, especially those marked as TODO. Attempts will be made to extend API functionality by providing backwards compatibility wherever possible in an attempt to ensure minimal disruption to existing programs utilizing the Python API.
New Quickstart Example
from scenedetect import SceneManager
from scenedetect.detectors import ContentDetector # Content-aware detection (detect-content via CLI)
def find_scenes(video_path, threshold=30.0):
# Create our scene manager, then add the detector.
scene_manager = SceneManager(video_path)
scene_manager.add_detector(
ContentDetector, threshold=threshold)
# Improve processing speed by downscaling before processing.
scene_manager.get_video_input().set_downscale_factor()
scene_manager.detect_scenes() # Required to now start & reset video manager
# Each returned scene is a tuple of the (start, end) timecode.
return scene_manager.get_scene_list()
Breaking API Changes
Important new enumeration for below:
EventType
-
EventType.IN
Fade On / Start of scene -
EventType.OUT
: Fade Out / End of scene -
EventType.CUT
: Change of Scene / Shot / Event
SceneDetector
-
SceneDetector
methodprocess_frame()
shall returns a list of events in the form[ (frame #, EventType), ... ]
(previously it was a list of cuts in the form[ frame #, ...]
) - Should the
SceneDetector
base classpost_process()
function be moved to a separate post-processing filter-type object? Should it remain in both? - SceneManager will call new
attach_scene_manager
method when being added to allow access to the VideoStream and StatsManager being used - A detector may now assume that both a VideoStream and a StatsManager is available in the parent/owning SceneManager class, if any, as an invariant
-
stats_manager_required()
will be removed as it is no longer required - TODO: How should pure-online versus offline algorithms be distinguished? Lack of a post_process function? If so, may have to split the base detector again and use multiple inheritance.
- TODO: Does post_process() need the final frame number/timecode?
- TODO: Write a short migration guide for existing SceneDetectors showing how to obtain the previous arguments.
-
process_frame()
andpost_process()
shall now return a list of tuples of the form(frame_num (int), event_type (EventType), confidence (float))
SparseSceneDetector
-
SparseSceneDetector
class will be removed from thescenedetector.scene_detector
module in favor of having the existingSceneDetector
returning anEventType
(rather than just cuts) along with it's frame number. There appears to be no utilization of this class outside of the currently undocumentedMotionDetector
algorithm, so this is not expected to affect any users.
MetricProvider
- Will provide one or more frame metrics stored in the StatsManager for either online or offline processing
- Detectors should instantiate the required metrics through the parent SceneManager, which will ensure no duplication of any metric providers across multiple detector instances (this also removes the requirement for any kind of global metric registry, instead allowing better code reuse within each SceneDetector)
- If only online algorithms are used, there does not need to be any cache of metrics, reducing memory consumption
- TODO: Should the metrics be retrieved through the MetricProvider instead of the StatsManager? See previous point. This is how the previous design used to be until offline algorithms were added in v0.5.x.
SceneManager
- The event list shall now return all types of events (in/out/cut), and in the call to get the scene list, the events will be turned into a sequence of scenes
-
get_cut_list()
will be removed, as the information it provides can be retrieved now fromget_event_list()
(by only looking atEventType,CUT
events). -
get_event_list()
will return a list of tuples of anEventType
and aFrameTimecode
, rather than a pair of IN/OUT events, to allow for greater flexibility -
get_scene_list()
may require additional arguments to allow some kind of post-processing/filtering when generating the output scene list based on the list of detected events; this is distinct from the detection algorithmpost_process
function, but highly related, so any feedback in that regard would be helpful. -
get_scene_list()
should no longer require passing an explicit base timecode (i.e. the argument is now optional) -
SceneManager
will now require aVideoManager
or other frame source upon construction, rather than delegating todetect_scenes()
so that detectors can access information from the VideoManager itself - New
get_video_manager()
method to return reference toVideoManager
the object was created with - Constructor will create
StatsManager
automatically now (allow overriding with explicit named parameter for backwards compatibility) - New
get_stats_manager()
method to return implicitly createdStatsManager
object - TODO: Add example usage to documentation. Update API documentation accordingly.
StatsManager
- Add a public
get_metric()
/set_metric()
method to allow more idiomatic calls to set/retrieve frame statistics - Consider refactoring
get_metrics()
to return all metrics for the frame as a dict, as this object already exists in memory (i.e. allow the metrics argument to be optional, and just return all available metrics for the frame)
FrameTimecode
- ~No changes are planned for v1.0 at this time~ Timecode representation will be reworked so that frames start from 1 but times start from 0 (i.e. frame 1 has presentation time 0.0 seconds), which also helps with supporting variable frame rate videos (#168)
- It may be worth adding a method to VideoStream to get the current time as a float, rather than just the frame number, to support this future effort
Hi all my feedback comes from a newbie so take it from what it's worth...
In big terms, I wouldn't change the behavior of the software, but add a complementary way of detecting stuff. Some people may want to use the timecodes as and some may want to use the in, out events.
So my 1 cent answer is... if you ask either or, aim for both (easy to say, harder to code)
SceneManager Add new callback argument to detect_scenes() which will be invoked whenever a new scene has been detected (#5)
TODO: Add example usage to documentation. Update API documentation accordingly.
The more people use it, the more help you'll get. The easier it is to use, the more people will use it.
Both - TODO: Determine required changes to support event-based detectors. Instead of events representing a pair of timecodes, instead they shall be represented as in and out events. Both - TODO: Should get_event_list() be changed to return a list of sorted in/out events, or should it's return type be kept consistent as pairs of frames (thus dropping the last in event until a corresponding out event is available)?
SparseSceneDetector
SparseSceneDetector shall be renamed to EventDetector to better reflect functionality
What if you split it in 2 functions so people chose the one they prefer to use? One as is, one with the change you propose.
Instead of returning a pair of frames, EventDetectors shall instead return a pair of (frame number, event type) where event type is either begin or end (shall be made integer constants, e.g. scenedetect.event_type.begin) These changes are planned to better support live mode as well as "generator" mode, where invoking detect_scenes() on a SceneManager will return as soon as a new scene cut is detected, or any type of event is detected
If multiple EventDetectors are combined, their in events must be anded (e.g. they most both detect an in-event), and their out events should be configurable to be an and or or (default or) TODO: Should the callback be invoked on each type of event, or only on a pair of begin/end events? My thought is on each event type to allow for greater flexibility, however, if a way to accomplish both cleanly can be developed, that would be preferred.
Yes, the more options you give users, the more likely they'll think of ways to build on top of it
ThresholdDetector ThresholdDetector shall be modified to be an EventDetector rather than a regular SceneDetector Combined with the above changes, this would allow for a callback to be invoked when the threshold is crossed above/below (rather than the current design, which will only trigger the callback after a transition from below -> above -> below the threshold (i.e. latch on rising edge, trigger on falling edge)
Awesome! more ways of defining what the change was. Region based, color based, sound volume based, ...
I hope I haven't wasted your time... I did want to give my 2 cents.
Hi @santiagodemierre;
In big terms, I wouldn't change the behavior of the software, but add a complementary way of detecting stuff. Some people may want to use the timecodes as and some may want to use the in, out events.
The end goal of these changes are definitely to support both of those use cases. Just to clarify, when you say use the timecodes and some may want to use in/out events, are you referring to the calls to get_event_list()
and get_scene_list()
in SceneManager
?
Currently, scenes and events are represented differently internally - the goal of these changes is to represent everything as an event, and move the logic for actually creating scenes out of cuts from the detectors to the SceneManager
's get_scene_list()
method (rather than having individual detectors do that logic).
There won't be any removal of functionality from the SceneManager
- the existing API will be modified to support both use cases. These breaking changes are just moving some of the post-processing stages from the actual individual detection algorithms to the SceneManager
when you try to obtain the actual events/scene list.
The end goal will be that SceneManager
will return you a list of scenes (pairs of FrameTimecodes
or frame numbers) or a list of events, as it does today - the only changes will be some of the arguments to the existing methods.
What if you split it in 2 functions so people chose the one they prefer to use? One as is, one with the change you propose.
Sorry, could you expand a bit on this point? My idea was that the arguments you pass would dictate how the scenes get generated based on events - does this align with your thoughts?
Thanks for the feedback!
Sorry @santiagodemierre;
I also realize the changes I wrote above don't reflect the actual direction - I've revised them accordingly, my apologies! Any new feedback would be so useful, thank you!
In essence, I want to provide both - I want to give a list of events (in, out, and cut), as well as a list of timecodes. Hopefully this aligns with what you were proposing (since now the callback will be invoked in all cases - on rising edge, falling edge, and on fast cut, and there is a single base class that all detectors can now share).
Thank you!
As somebody that uses the Python API, I have a couple questions about these changes.
First, if I understand your proposed changes correctly, the change to detect_scenes
won't have any impact on analyzing standalone videos currently. It will just return the total number of frames processed. The callback function however, could be invoked even in a non-livestream use case as well. Similarly, will the get_scene_list
function still return a list of (start_time, end_time) tuples?
I think removing the SparseSceneDetector
class and post_process
function makes sense. One question about how this functionality will be brought into the SceneManager
class is will a post_process
-like function be added and called whenever the get_scene_list
, get_cut_list
, or get_event_list
functions are invoked? Or will it be called at the end of detect_scenes
? I have never really used the ThresholdDetector
much so I am not too familiar with the need for post processing, but I think it makes more sense to invoke the new post_process
at the end of detect_scenes
so that it would be possible to access the internal class variable like _cutting_list
or _event_list
and have them be complete even if the corresponding get
function is never invoked.
One final question about these changes is the impact it will have on the StatsManager
. I am guessing that the detected events for each frame will be included as a metric for the StatsManager
. This would enable fast re-analysis using the new API. This means making sure that events are output in the stats file csv. However, would it be possible to have multiple events occur on the same frame and how would that be handled? For example, if you were using more than one detector, a threshold and content detector, and a fade-in was detected on the same frame as a hsv jump triggering the content detector, would two different events get written to the stats manager? Off-topic, but this is a scenario I have been thinking about for a while, because min_scene_len
is defined on a per-detector basis. So, if you want to use multiple detectors, they can detect scenes independently inside that specified min_scene_len
parameter because they don't see what the other detector is doing. EDIT I see this has already been added to the v0.6 milestone in #131
Thanks for the feedback @wjs018 - as per our discussion in #153, definitely need to revisit my plans for post_process
. See my recent comment there, looking forward to any feedback you might have on the matter. I don't think this blocks getting in PR #198, but that definitely makes me want to revisit these API changes to support that best.
As for outputting the events to the statsfile, I don't think this is actually necessary since the determination of an event should be from the given metrics stored in the file, plus the parameters passed to the detector (i.e. the event type for a given frame should be able to be inferred from the metrics in the statsfile). If you think this isn't a good assumption to make though, then I'm definitely open to considering adding it. As you mentioned though, there are some edge cases to think through with that option, so I'd like to try and avoid it unless absolutely necessary.
is the confidence score of the split available somewhere when using detect_scenes?
Hi @segalinc;
While not provided by the API directly, assuming you're using the ContentDetector
algorithm, you could derive this information by using a StatsManager
to obtain and normalize the content_val
metric (delta HSV from previous frame). The value is the floating point difference from 0.0-255.0, so you can divide this by 255.0 to obtain a normalized score (this corresponds with the --threshold
argument on the command line).
Are you just looking for a confidence score of the scene cut itself, or the confidence of there being a split for every frame in the video? As mentioned, the latter can be obtained using a StatsManager
, however I can definitely see the use case for returning it with each split (thus accessible through SceneManager
after calling detect_scenes()
). This would require that the process_frame()
for each SceneDetector
returns a list of events in the form [ (frame #, EventType, confidence score), ... ], but I'm open to including this change in the upcoming v0.6 release.
This implies that a kind of "detection result" class should be created with named fields to encapsulate all of the data associated with events that detection algorithms produce, rather than just returning a tuple. Then get_event_list()
could return these objects directly, and get_scene_list()
would just return a pair of confidence scores for the beginning and end of the scene (resolving the ambiguity there of calculating a confidence score for the frame as a whole, leaving that to the end user). Does that sound reasonable at least?
Thanks for the question/suggestion, any feedback is most welcome.
Hi,
Thanks for the detailed answer. I think a confidence score similar to what you get using ffprobe should work, so either a score for both start and of the cut as you mentioned or of the full shot. As possible return list you could do (start,end,score,fps) so that then it's also easy to convert it to the timeencoded type
Keep me posted!! Thank you!
Cristina
Sent from my OnePlus
On Wed, Feb 3, 2021, 15:59 Brandon Castellano [email protected] wrote:
Hi @segalinc https://github.com/segalinc;
While not provided by the API directly, assuming you're using the ContentDetector algorithm, you could derive this information by using a StatsManager to obtain and normalize the content_val metric (delta HSV from previous frame). The value is the floating point difference from 0.0-255.0, so you can divide this by 255.0 to obtain a normalized score (this corresponds with the --threshold argument on the command line).
Are you just looking for a confidence score of the scene cut itself, or the confidence of there being a split for every frame in the video? As mentioned, the latter can be obtained using a StatsManager, however I can definitely see the use case for returning it with each split. This would require that the process_frame() for each SceneDetector returns a list of events in the form [ (frame #, EventType, confidence score), ... ], but I'm open to including this change in the upcoming v0.6 release.
This implies that a kind of "detection result" class should be created with named fields to encapsulate all of the data associated with events that detection algorithms produce, rather than just returning a tuple. Then get_event_list() could return these objects directly, and get_scene_list() would just return a pair of confidence scores for the beginning and end of the scene (resolving the ambiguity there of calculating a confidence score for the frame as a whole, leaving that to the end user). Does that sound reasonable at least?
Thanks for the question/suggestion, any feedback is most welcome.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Breakthrough/PySceneDetect/issues/177#issuecomment-772911609, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHLPES5PFL4TOGGY7J2B7TS5HPMPANCNFSM4PYYL2CQ .
Closing this issue out, as it's preferable to slowly push towards a more stable API rather than make a rapid breaking change like this. Each subsequent release should bring us closer to what we desire here.