DeepLabCut icon indicating copy to clipboard operation
DeepLabCut copied to clipboard

Feature request: saving analyzed video data at checkpoints

Open jonahpearl opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. I'm always frustrated when: I run DLC dlc.analyze_videos on HPC's, and the videos seem to have highly variable analysis times, causing some jobs to run for 5 hours but then fail, with nothing to show for it. Eg for a bunch of 1 hour videos, some finish in 2-3 hours, some finish in 10. (This might be due to slightly different camera angles between sessions, and those sessions with slightly weirder / more unusual angles are harder for the model to evaluate? Not sure. Separate issue.)

Describe the solution you'd like I would like to be able to say, hey, save a checkpoint every 100,000 frames. Then you can re-start in the middle without having to redo all that previous computation.

Describe alternatives you've considered It looks like use_shelf is an option that you've considered for this, but 1) it only is implemented for multi-animal, 2) it's memory inefficient because it holds the entire dataset in memory (if I'm skimming the docs right). Why not save a pickle with some suffix like PARTIAL or CHECKPOINT, check if that pickle exists before starting, and load it in? My (simplistic and maybe wrong) understanding is that DLC operates frame by frame, so restarting from the middle shouldn't effect outcomes.

Thanks! Not urgent but maybe for the roadmap.

jonahpearl avatar Feb 27 '23 17:02 jonahpearl

and the videos seem to have highly variable analysis times, causing some jobs to run for 5 hours but then fail, with nothing to show for it. Eg for a bunch of 1 hour videos, some finish in 2-3 hours, some finish in 10. (This might be due to slightly different camera angles between sessions, and those sessions with slightly weirder / more unusual angles are harder for the model to evaluate? Not sure. Separate issue.)

This seems an issue with HPC actually; I never see different analysis runtimes for "more complex" data; we even benchmarked this across keypoints and models and a lot of hardware and see no such issues - see Warren & Mathis 2018, and Mathis 2021 also Kane et al. 2020

Note, these are older GPUs, so things only got faster:

F4 large

68747470733a2f2f696d616765732e73717561726573706163652d63646e2e636f6d2f636f6e74656e742f76312f3537663664353163396637343536366635356563663237312f313537303035343132383034322d3531484359315939475637474151545a35424d422f6b6531375a77644742546f646449

MMathisLab avatar Mar 03 '23 13:03 MMathisLab