Support dictionary protocol in `StatsManager` for easier Pandas DataFrame conversion
Hi, thank you for your work on this project. I have been playing around with it at work and it has been very useful and fun to learn from.
I was working on an implementation where I wanted to selected the best scene from a frame based on the content value that comes from stats manager. The way I went about it was calling the save_to_csv function then reading that into a dataframe and then filtering the dataframe between timestamps.
Eventually, I wanted to move away from saving a csv and wanted to get to the dataframe directly. I got this working by essentially recreating the save_to_csv function by appending rows to a dataframe rather than writing to a csv. However in that process I noticed in lines 191 to 193 in stats_manager.py that when get_frames is called for the frame_key, a +1 is added.
I assume this is for readability purposes? However when I print the df that I create from the save_to_csv, the Frame_Number that is printed begins from 2 and not 1. When I remove the +1 in my function I get the Frame_Number to being from 1.
I am wondering if there is a reason for this as it affects downstream applications based on frame numbers
I can provide some screenshots if you need
However in that process I noticed in lines 191 to 193 in stats_manager.py that when get_frames is called for the frame_key, a +1 is added.
I assume this is for readability purposes?
This is to align with other commands on the CLI. 1-based frame indices were chosen to align with the default formatting ffmpeg uses when extracting frames as images.
However when I print the df that I create from the save_to_csv, the Frame_Number that is printed begins from 2 and not 1. When I remove the +1 in my function I get the Frame_Number to being from 1.
Detectors all produce metrics based on the differences between the previous frame. We don't emit any metrics for frame 0, so this is expected.
Ideally I would like to change it so StatsManager acts like a regular Python dict (or can be converted to one), so that you can use pandas.DataFrame.from_dict on it. Internally the stats are already stored this way, but note that those variable names are subject to change.
There's a long standing TODO to fix this, so we should probably consider it for the next major API release.
Hey! Thank you for the explanation. That makes a lot of sense. Appreciate you taking out the time to respond. Will close this for now and looking forward to the next release thank you for all the work! Happy to share the small snippet of code used to access the stats as a dataframe directly if needed.
Let's leave this open until resolved, I think this is a good thing to include.