pliers
pliers copied to clipboard
Incorrect conversion to df for Google Video FRAME_MODE
In FRAME_MODE GoogleVideoIntelligence returns results samped at 1hz (or so it seems, their docs don't say anything).
However, pliers is attempting to add durations to these events based on the distances between offsets.
Example, for chair, the raw results look something like:
{'entity': {'entityId': '/m/01mzpv',
'description': 'chair',
'languageCode': 'en-US'},
'categoryEntities': [{'entityId': '/m/0c_jw',
'description': 'furniture',
'languageCode': 'en-US'}],
'frames': [{'timeOffset': '55s', 'confidence': 0.4019864},
{'timeOffset': '96s', 'confidence': 0.42338032},
{'timeOffset': '97s', 'confidence': 0.6609389},
{'timeOffset': '129s', 'confidence': 0.4277751},
{'timeOffset': '156s', 'confidence': 0.6204254},
}
But the df looks like:
| onset | duration | chair |
|---|---|---|
| 55.0 | 41.00 | 0.401986 |
| 96.0 | 1.00 | 0.423380 |
| 97.0 | 32.00 | 0.660939 |
| 129.0 | 27.00 | 0.427775 |
| 156.0 | 1.00 | 0.620425 |
The durations should all be 1 in this case.
Makes sense 👍 , should be a semi-straightforward fix
On this note, I was comparing the results of FRAME_MODE to using FrameSamplingFilter at 1hz and feeding it to VisionAPI. The results are basically about the same, except VideoIntelligence returns more features overall (probably different threshold).
So the main advantage is VideoIntelligence can be much faster (if you feed it a video file in a manageable codec / size).
Actually, another advantage is that VideoIntellgience returns category entities for each tag. This could be really useful, as many categories are super specific, but we might want to analyze at a slightly broader level (e.g. furniture instead of chair). We don't seem to currently extract that information.