video2numpy
video2numpy copied to clipboard
Frame decoding cluster
Outlining a plan for turning this into a super nice video dataloader. The main functionality missing is to allow the user to take advantage of an imbalance of cheap CPU compute / expensive GPU compute by launching a frame decoding cluster on the CPU cluster and connecting to that cluster on the GPU cluster and simply sending requests for decoded video shards.
Steps in the process
- Launch the cluster of N Frame workers. It begins decoding videos into some shared memory structure
- Dataloader request data from the cluster
- Cluster sends metadata about shards over some link (metadata is fine since small)
What concepts do we need:
- Video Manager: which videos get decoded and how they get allocated on the mem
- Frame decoder: how do you decode each video specifically
- Shared memory data structure: should support variety of options, S3, /fsx, etc.
- Some schema for how the data is organized in memory: shared frame queue? just shard with ids?
- Communication: what do we send to the loader?
- Loader: how does the loader extract pixels from memory + metadata from communication
Video Manager:
- Important to support things like shuffle buffers and whatnot
- Calls workers with data
Frame decoder: (already solved here)