video2numpy icon indicating copy to clipboard operation
video2numpy copied to clipboard

Frame decoding cluster

Open iejMac opened this issue 1 year ago • 2 comments

Outlining a plan for turning this into a super nice video dataloader. The main functionality missing is to allow the user to take advantage of an imbalance of cheap CPU compute / expensive GPU compute by launching a frame decoding cluster on the CPU cluster and connecting to that cluster on the GPU cluster and simply sending requests for decoded video shards.

Steps in the process

  1. Launch the cluster of N Frame workers. It begins decoding videos into some shared memory structure
  2. Dataloader request data from the cluster
  3. Cluster sends metadata about shards over some link (metadata is fine since small)

What concepts do we need:

  • Video Manager: which videos get decoded and how they get allocated on the mem
  • Frame decoder: how do you decode each video specifically
  • Shared memory data structure: should support variety of options, S3, /fsx, etc.
  • Some schema for how the data is organized in memory: shared frame queue? just shard with ids?
  • Communication: what do we send to the loader?
  • Loader: how does the loader extract pixels from memory + metadata from communication

iejMac avatar Dec 11 '23 23:12 iejMac

Video Manager:

  • Important to support things like shuffle buffers and whatnot
  • Calls workers with data

iejMac avatar Dec 11 '23 23:12 iejMac

Frame decoder: (already solved here)

iejMac avatar Dec 11 '23 23:12 iejMac