mediapipe What are the Output Tensors of Palm Detection?

I am trying to insert custom networks into the hands pipeline to detect not-quite human hands and to understand the inputs and outputs of the existing networks. As I understand, the outputs of the Landmark Detection are the 1x63 tensor containing the landmarks' xyz coordinates, a 1x1 presence tensor, a 1x1 handedness tensor and an unused 1x63 tensor (or are the 1x63 for right and left hands?) But I do not understand, how the 1x2016x1 and the 1x2016x18 tensors are representing a directed bounding box and maybe a presence value.

Edit: To add to this, why are the inputs 192x192 (palm) and 224x224 (landmark) pixels instead of the 256x256 pixels as the paper states?

Jun 17 '22 11:06 ElonXXIII

Hi @ElonXXIII , As in SSD models, these are the predictions based on predefined anchors. We use TfLiteTensorsToDetectionsCalculator to decode the output tensors given the SSD anchors.

Jun 20 '22 07:06 sureshdagooglecom

From the calculator's .cc file: First tensor is the predicted raw boxes/keypoints. The size of the values must be (num_boxes * num_predicted_values). Second tensor is the score tensor. So there are 2016 boxes, the 1x2016x1 tensor is the score tensor and the 1x2016x18 tensor is the raw boxes/keypoints tensor. But why is it 18 keypoints? Are the 6 wrist landmarks in x,y,z the output? I thought the output was a directed bounding box?

Jun 20 '22 08:06 ElonXXIII

Are you satisfied with the resolution of your issue? Yes No

Jun 20 '22 08:06 google-ml-butler[bot]

Accidentally closed

Jun 20 '22 08:06 ElonXXIII

The information what output tensor corresponds to "the hand fills out exactly the whole screen and is upright" would also be sufficient for a workaround for my use-case

Jul 08 '22 14:07 ElonXXIII

18 keypoints?

first 4 float values is dx,dy,w,h == bbox but dx and dy must know anchor points anther 14 values is (x0,y0), (x1,y1), ...(x6,y6) = 7 keypoints

Source: https://github.com/junhwanjang/mediapipe-models/blob/main/palm_detection/assets/palm_7_landmark_index.png

my problem is

I don't know how to find anchor points of 2016 anchors for this SSD model
Is 7 keypoints relative to anchor points? I try center of image, but didn't success

Aug 25 '22 04:08 saknarak

From ssd_anchors_calculator

https://github.com/google/mediapipe/blob/master/mediapipe/calculators/tflite/ssd_anchors_calculator.cc

I try to create anchors point with parameter from https://github.com/google/mediapipe/blob/master/mediapipe/modules/palm_detection/palm_detection_cpu.pbtxt

I can produce only half anchor points (1008) not 2016 Layer 1: 192/8 = 24 => 2424 = 576 Layer 2-4: 192/16 = 12 => 1212 = 144 * 3= 432 Total: 576 + 432 = 1008

let anchorOptions = {
  numLayers: 4,
  minScale: 0.1484375,
  maxScale: 0.75,
  inputSizeWidth: 192,
  inputSizeHeight: 192,
  anchorOffsetX: 0.5,
  anchorOffsetY: 0.5,
  strides: [8, 16, 16, 16],
  aspectRatios: [1.0],
  fixedAnchorSize: true,
}



function calculateScale(minScale, maxScale, strideIndex, numStrides) {
  if (numStrides === 1) {
    return (minScale + maxScale) * 0.5
  }
  return minScale + (maxScale - minScale) * 1.0 * strideIndex / (numStrides - 1.0)
}

export function generateAnchors(options) {
  const anchors = []
  if (!options.featureMapHeightSize && !options.strides?.length) {
    throw new Error('Both feature map shape and strides are missing. Must provide either one.')
  }
  if (options.featureMapHeightSize) {
    if (options.strides?.length) {
      throw new Error('Found feature map shapes. Strides will be ignored.')
    }
    if (options.featureMapHeightSize !== options.numLayers) {
      throw new Error('options.featureMapHeightSize !== options.numLayers')
    }
    if (options.featureMapHeightSize !== options.featureMapWidth_size) {
      throw new Error('options.featureMapHeightSize !== options.featureMapWidth_size')
    }
  } else {
    if (options.strides.length !== options.numLayers) {
      throw new Error('options.options.strides.length !== options.numLayers')
    }
  }

  let layerId = 0
  while (layerId < options.numLayers) {
    let anchorHeight = []
    let anchorWidth = []
    let aspectRatios = []
    let scales = []

    console.log('layerId=', layerId)

    let lastSameStrideLayer = layerId
    while (lastSameStrideLayer < options.strides.length && options.strides[lastSameStrideLayer] === options.strides[layerId]) {
      const scale = calculateScale(options.minScale, options.maxScale, lastSameStrideLayer, options.strides.length)
      if (lastSameStrideLayer === 0 && options.reduceBoxesInLowestLayer) {
        aspectRatios.push(1.0)
        aspectRatios.push(2.0)
        aspectRatios.push(0.5)
        scales.push(0.1)
        scales.push(scale)
        scales.push(scale)
      } else {
        for (let aspectRatioId = 0; aspectRatioId < options.aspectRatios.length; aspectRatioId++) {
          aspectRatios.push(options.aspectRatios[aspectRatioId])
          scales.push(scale)
        }
        if (options.interpolatedScaleAspectRatio > 0.0) {
          const scaleNext = lastSameStrideLayer === options.strides.length - 1 ? 1.0 : calculateScale(options.minScale, options.maxScale, lastSameStrideLayer + 1, options.strides.length)
          scales.push(Math.sqrt(scale * scaleNext))
          aspectRatios.push(options.interpolatedScaleAspectRatio)
        }
      }
      lastSameStrideLayer++
    }

    for (let i = 0; i < aspectRatios.length; i++) {
      const ratioSqrts = Math.sqrt(aspectRatios[i])
      anchorHeight.push(scales[i] / ratioSqrts)
      anchorWidth.push(scales[i] * ratioSqrts)
    }

    let featureMapHeight = 0
    let featureMapWidth = 0
    if (options.featureMapHeightSize) {
      featureMapHeight = options.featureMapHeight[layerId]
      featureMapWidth = options.featureMapWidth[layerId]
    } else {
      const stride = options.strides[layerId]
      featureMapHeight = Math.ceil(options.inputSizeHeight / stride)
      featureMapWidth = Math.ceil(options.inputSizeWidth / stride)
    }

    for (let y = 0; y < featureMapHeight; y++) {
      for (let x = 0; x < featureMapWidth; x++) {
        for (let anchorId = 0; anchorId < anchorHeight.length; anchorId++) {
          const xCenter = (x + options.anchorOffsetX) / featureMapWidth
          const yCenter = (y + options.anchorOffsetY) / featureMapHeight

          let newAnchor = {}
          newAnchor.xCenter = xCenter
          newAnchor.yCenter = yCenter

          if (options.fixedAnchorSize) {
            newAnchor.w = 1
            newAnchor.h = 1
          } else {
            newAnchor.w = anchorWidth[anchorId]
            newAnchor.h = anchorHeight[anchorId]
          }
          anchors.push(newAnchor)
        }
      }
    }
    layerId = lastSameStrideLayer
  }

  return anchors
}

Aug 25 '22 05:08 saknarak

Now I can generate all 2016 anchor points by add 0.5 ratio in aspectRatios array

let anchorOptions = {
  numLayers: 4,
  minScale: 0.1484375,
  maxScale: 0.75,
  inputSizeWidth: 192,
  inputSizeHeight: 192,
  anchorOffsetX: 0.5,
  anchorOffsetY: 0.5,
  strides: [8, 16, 16, 16],
  aspectRatios: [1.0, 0.5], // <--
  fixedAnchorSize: true,
}

to decode 18 float values

let palmImageSize = 192
// bbox[0] == xCenter, bbox[1] == yCenter, bbox[2] == width, bbox[3] == height
bbox[0] += anchors[maxPalmIdx].xCenter * palmImageSize
bbox[1] += anchors[maxPalmIdx].yCenter * palmImageSize
// width, height unchanged

// 14 values adjust with anchor
for (let i = 4; i < 18; i += 2) {
  let x = palmData.readFloatLE(offset + i * 4) + anchors[maxPalmIdx].xCenter * palmImageSize
  let y = palmData.readFloatLE(offset + i * 4 + 4) + anchors[maxPalmIdx].yCenter * palmImageSize
  points.push({ x, y })
}

Aug 25 '22 06:08 saknarak

For people that are looking for a csv of SSD anchors, I've attached the file here, along with code to generate it. Thanks @saknarak for your snippets!

https://github.com/VimalMollyn/GenMediaPipePalmDectionSSDAnchors

Sep 03 '22 06:09 VimalMollyn

Maybe 2016/63 = 32 -> 4byte float 63/3 = 21 (x,y,z) Left Right 2set

Oct 17 '22 11:10 jaiminlee

Now I can generate all 2016 anchor points by add 0.5 ratio in aspectRatios array

let anchorOptions = {
  numLayers: 4,
  minScale: 0.1484375,
  maxScale: 0.75,
  inputSizeWidth: 192,
  inputSizeHeight: 192,
  anchorOffsetX: 0.5,
  anchorOffsetY: 0.5,
  strides: [8, 16, 16, 16],
  aspectRatios: [1.0, 0.5], // <--
  fixedAnchorSize: true,
}

to decode 18 float values

let palmImageSize = 192
// bbox[0] == xCenter, bbox[1] == yCenter, bbox[2] == width, bbox[3] == height
bbox[0] += anchors[maxPalmIdx].xCenter * palmImageSize
bbox[1] += anchors[maxPalmIdx].yCenter * palmImageSize
// width, height unchanged

// 14 values adjust with anchor
for (let i = 4; i < 18; i += 2) {
  let x = palmData.readFloatLE(offset + i * 4) + anchors[maxPalmIdx].xCenter * palmImageSize
  let y = palmData.readFloatLE(offset + i * 4 + 4) + anchors[maxPalmIdx].yCenter * palmImageSize
  points.push({ x, y })
}

set interpolatedScaleAspectRatio=1，will be right.

Oct 23 '22 08:10 seathiefwang

Hello @ElonXXIII, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. Thank you

May 05 '23 10:05 kuaashish

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

May 14 '23 01:05 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

May 21 '23 01:05 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

May 21 '23 01:05 google-ml-butler[bot]

mediapipe mediapipe copied to clipboard

What are the Output Tensors of Palm Detection?

mediapipe
mediapipe copied to clipboard