mediapipe
mediapipe copied to clipboard
What are the Output Tensors of Palm Detection?
I am trying to insert custom networks into the hands pipeline to detect not-quite human hands and to understand the inputs and outputs of the existing networks. As I understand, the outputs of the Landmark Detection are the 1x63 tensor containing the landmarks' xyz coordinates, a 1x1 presence tensor, a 1x1 handedness tensor and an unused 1x63 tensor (or are the 1x63 for right and left hands?) But I do not understand, how the 1x2016x1 and the 1x2016x18 tensors are representing a directed bounding box and maybe a presence value.
Edit: To add to this, why are the inputs 192x192 (palm) and 224x224 (landmark) pixels instead of the 256x256 pixels as the paper states?
Hi @ElonXXIII , As in SSD models, these are the predictions based on predefined anchors. We use TfLiteTensorsToDetectionsCalculator to decode the output tensors given the SSD anchors.
From the calculator's .cc file: First tensor is the predicted raw boxes/keypoints. The size of the values must be (num_boxes * num_predicted_values). Second tensor is the score tensor. So there are 2016 boxes, the 1x2016x1 tensor is the score tensor and the 1x2016x18 tensor is the raw boxes/keypoints tensor. But why is it 18 keypoints? Are the 6 wrist landmarks in x,y,z the output? I thought the output was a directed bounding box?
Accidentally closed
The information what output tensor corresponds to "the hand fills out exactly the whole screen and is upright" would also be sufficient for a workaround for my use-case
18 keypoints?
first 4 float values is dx,dy,w,h == bbox but dx and dy must know anchor points anther 14 values is (x0,y0), (x1,y1), ...(x6,y6) = 7 keypoints
Source: https://github.com/junhwanjang/mediapipe-models/blob/main/palm_detection/assets/palm_7_landmark_index.png
my problem is
- I don't know how to find anchor points of 2016 anchors for this SSD model
- Is 7 keypoints relative to anchor points? I try center of image, but didn't success
From ssd_anchors_calculator
https://github.com/google/mediapipe/blob/master/mediapipe/calculators/tflite/ssd_anchors_calculator.cc
I try to create anchors point with parameter from https://github.com/google/mediapipe/blob/master/mediapipe/modules/palm_detection/palm_detection_cpu.pbtxt
I can produce only half anchor points (1008) not 2016 Layer 1: 192/8 = 24 => 2424 = 576 Layer 2-4: 192/16 = 12 => 1212 = 144 * 3= 432 Total: 576 + 432 = 1008
let anchorOptions = {
numLayers: 4,
minScale: 0.1484375,
maxScale: 0.75,
inputSizeWidth: 192,
inputSizeHeight: 192,
anchorOffsetX: 0.5,
anchorOffsetY: 0.5,
strides: [8, 16, 16, 16],
aspectRatios: [1.0],
fixedAnchorSize: true,
}
function calculateScale(minScale, maxScale, strideIndex, numStrides) {
if (numStrides === 1) {
return (minScale + maxScale) * 0.5
}
return minScale + (maxScale - minScale) * 1.0 * strideIndex / (numStrides - 1.0)
}
export function generateAnchors(options) {
const anchors = []
if (!options.featureMapHeightSize && !options.strides?.length) {
throw new Error('Both feature map shape and strides are missing. Must provide either one.')
}
if (options.featureMapHeightSize) {
if (options.strides?.length) {
throw new Error('Found feature map shapes. Strides will be ignored.')
}
if (options.featureMapHeightSize !== options.numLayers) {
throw new Error('options.featureMapHeightSize !== options.numLayers')
}
if (options.featureMapHeightSize !== options.featureMapWidth_size) {
throw new Error('options.featureMapHeightSize !== options.featureMapWidth_size')
}
} else {
if (options.strides.length !== options.numLayers) {
throw new Error('options.options.strides.length !== options.numLayers')
}
}
let layerId = 0
while (layerId < options.numLayers) {
let anchorHeight = []
let anchorWidth = []
let aspectRatios = []
let scales = []
console.log('layerId=', layerId)
let lastSameStrideLayer = layerId
while (lastSameStrideLayer < options.strides.length && options.strides[lastSameStrideLayer] === options.strides[layerId]) {
const scale = calculateScale(options.minScale, options.maxScale, lastSameStrideLayer, options.strides.length)
if (lastSameStrideLayer === 0 && options.reduceBoxesInLowestLayer) {
aspectRatios.push(1.0)
aspectRatios.push(2.0)
aspectRatios.push(0.5)
scales.push(0.1)
scales.push(scale)
scales.push(scale)
} else {
for (let aspectRatioId = 0; aspectRatioId < options.aspectRatios.length; aspectRatioId++) {
aspectRatios.push(options.aspectRatios[aspectRatioId])
scales.push(scale)
}
if (options.interpolatedScaleAspectRatio > 0.0) {
const scaleNext = lastSameStrideLayer === options.strides.length - 1 ? 1.0 : calculateScale(options.minScale, options.maxScale, lastSameStrideLayer + 1, options.strides.length)
scales.push(Math.sqrt(scale * scaleNext))
aspectRatios.push(options.interpolatedScaleAspectRatio)
}
}
lastSameStrideLayer++
}
for (let i = 0; i < aspectRatios.length; i++) {
const ratioSqrts = Math.sqrt(aspectRatios[i])
anchorHeight.push(scales[i] / ratioSqrts)
anchorWidth.push(scales[i] * ratioSqrts)
}
let featureMapHeight = 0
let featureMapWidth = 0
if (options.featureMapHeightSize) {
featureMapHeight = options.featureMapHeight[layerId]
featureMapWidth = options.featureMapWidth[layerId]
} else {
const stride = options.strides[layerId]
featureMapHeight = Math.ceil(options.inputSizeHeight / stride)
featureMapWidth = Math.ceil(options.inputSizeWidth / stride)
}
for (let y = 0; y < featureMapHeight; y++) {
for (let x = 0; x < featureMapWidth; x++) {
for (let anchorId = 0; anchorId < anchorHeight.length; anchorId++) {
const xCenter = (x + options.anchorOffsetX) / featureMapWidth
const yCenter = (y + options.anchorOffsetY) / featureMapHeight
let newAnchor = {}
newAnchor.xCenter = xCenter
newAnchor.yCenter = yCenter
if (options.fixedAnchorSize) {
newAnchor.w = 1
newAnchor.h = 1
} else {
newAnchor.w = anchorWidth[anchorId]
newAnchor.h = anchorHeight[anchorId]
}
anchors.push(newAnchor)
}
}
}
layerId = lastSameStrideLayer
}
return anchors
}
Now I can generate all 2016 anchor points by add 0.5 ratio in aspectRatios array
let anchorOptions = {
numLayers: 4,
minScale: 0.1484375,
maxScale: 0.75,
inputSizeWidth: 192,
inputSizeHeight: 192,
anchorOffsetX: 0.5,
anchorOffsetY: 0.5,
strides: [8, 16, 16, 16],
aspectRatios: [1.0, 0.5], // <--
fixedAnchorSize: true,
}
to decode 18 float values
let palmImageSize = 192
// bbox[0] == xCenter, bbox[1] == yCenter, bbox[2] == width, bbox[3] == height
bbox[0] += anchors[maxPalmIdx].xCenter * palmImageSize
bbox[1] += anchors[maxPalmIdx].yCenter * palmImageSize
// width, height unchanged
// 14 values adjust with anchor
for (let i = 4; i < 18; i += 2) {
let x = palmData.readFloatLE(offset + i * 4) + anchors[maxPalmIdx].xCenter * palmImageSize
let y = palmData.readFloatLE(offset + i * 4 + 4) + anchors[maxPalmIdx].yCenter * palmImageSize
points.push({ x, y })
}
For people that are looking for a csv of SSD anchors, I've attached the file here, along with code to generate it. Thanks @saknarak for your snippets!
https://github.com/VimalMollyn/GenMediaPipePalmDectionSSDAnchors
Maybe 2016/63 = 32 -> 4byte float 63/3 = 21 (x,y,z) Left Right 2set
Now I can generate all 2016 anchor points by add 0.5 ratio in aspectRatios array
let anchorOptions = { numLayers: 4, minScale: 0.1484375, maxScale: 0.75, inputSizeWidth: 192, inputSizeHeight: 192, anchorOffsetX: 0.5, anchorOffsetY: 0.5, strides: [8, 16, 16, 16], aspectRatios: [1.0, 0.5], // <-- fixedAnchorSize: true, }
to decode 18 float values
let palmImageSize = 192 // bbox[0] == xCenter, bbox[1] == yCenter, bbox[2] == width, bbox[3] == height bbox[0] += anchors[maxPalmIdx].xCenter * palmImageSize bbox[1] += anchors[maxPalmIdx].yCenter * palmImageSize // width, height unchanged // 14 values adjust with anchor for (let i = 4; i < 18; i += 2) { let x = palmData.readFloatLE(offset + i * 4) + anchors[maxPalmIdx].xCenter * palmImageSize let y = palmData.readFloatLE(offset + i * 4 + 4) + anchors[maxPalmIdx].yCenter * palmImageSize points.push({ x, y }) }
set interpolatedScaleAspectRatio=1,will be right.
Hello @ElonXXIII, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.
You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. Thank you
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.