vision-camera-resize-plugin icon indicating copy to clipboard operation
vision-camera-resize-plugin copied to clipboard

Question about return of resize

Open SamuelAlv3s opened this issue 1 year ago • 3 comments

Hello!

It's possible resize/crop a frame and return as the same type "Frame" instead a "Uint8Array" or "Float32Array" ?😬

SamuelAlv3s avatar Feb 06 '24 16:02 SamuelAlv3s

Hey!

I would love to return a Frame instead of a byte array, but this isn't as easy as it sounds.

  • iOS: We would need to create a new CMSampleBuffer, wrapping a new CVPixelBuffer, wrapping our in-memory array (or doing 1x memcopy if needed). This needs to be pooled, so we can retain the buffer longer than just for one Frame Processor call, otherwise there could be invalid buffers floating around. Would take me around 20 hours to implement.
  • Android: Unfortunately, this does not seem possible at all with Android's current media APIs! ☹️ We need to find some way to create an android.media.Image object which contains a backing HardwareBuffer. From then on, I can just fill the HardwareBuffer with my data (1x copy) and return it to JS. But unfortunately there is no way to just create Image instances.
    • There is ImageReader, which gives us a Surface we can stream Images into using OpenGL, but this is then always RGB and never in our custom formats (ARGB, BGRA, BGR, ...) or data types (float32). Also it is asynchronous, i.e. the Image is not immediately available after we write to the Surface. So not possible
    • There is ImageWriter which can give us Image instances, but it needs to point to a valid Surface and will only create Images in the target Surface's format, so we are back to the same problem again - not in custom format (ARGB, BGRA, BGR, ...) or data type (flaot32). Also, this is only available in API 26 afaik.

Overall it would definitely be a cooler API to return Frames, but Android's Media APIs don't allow this, as of now they are simply not as good/flexible as iOS' APIs.


For now, you can still pass the returned type (Uint8Array or Float32Array) to other Frame Processor Plugins without issues:

const { resize } = useResizePlugin()
const examplePlugin = VisionCameraProxy.initFrameProcessorPlugin('example_plugin')

const frameProcessor = useFrameProcessor((frame) => {
  'worklet'
  const resized = resize(frame, { ... })
  const arrayBuffer = resized.buffer

  // pass original `frame` because it is required for FPs, but we can ignore it later
  examplePlugin.call(frame, {
    // pass resized frame as an ArrayBuffer argument
    resizedFrame: arrayBuffer
  })
}, [resize])

Then just read the array from the arguments and ignore the Frame type:

@objc(ExampleFrameProcessorPlugin)
public class ExampleFrameProcessorPlugin: FrameProcessorPlugin {
  public override func callback(_ frame: Frame, withArguments arguments: [AnyHashable: Any]?) -> Any? {
    // we ignore `frame` as this is the full sized Frame

    // SharedArray is from VisionCamera
    let smallFrame = arguments["resizedFrame"] as SharedArray
    smallFrame.data // <-- access
  }
}

mrousavy avatar Feb 06 '24 17:02 mrousavy

Nice!!

Thanks for the answer😁

SamuelAlv3s avatar Feb 06 '24 18:02 SamuelAlv3s

Let's leave this open for now, maybe in the future there's gonna be an API in Android to create a pool of Images where I can have ownership of the memory without needing a Surface - that'd solve the problem of ImageWriter then!

mrousavy avatar Feb 06 '24 19:02 mrousavy