CenterNet icon indicating copy to clipboard operation
CenterNet copied to clipboard

xs, ys and preds explanation (inputs and outputs)

Open UmarSpa opened this issue 5 years ago • 1 comments

Can you please elaborate a bit about xs, ys and preds:

  • xs has 4 variables:
    • 1st of size [batchsize, 3, 511, 511] --> this is the input image
    • 2nd, 3rd and 4th of size [batchsize, 128] --> what are these ?
  • ys has 7 variables:
    • 1st, 2nd and 3rd of size [batchsize, 80, 128, 128] --> these are the ground-truth heatmaps for top left, bottom right and center keypoints. Right ?
    • 4th of size [batchsize, 128] --> what is this ?
    • 5th, 6th and 7th of size [batchsize, 128, 2] --> what are these ?
  • preds has 8 variables:
    • 1st, 2nd and 3rd of size [batchsize, 80, 128, 128] --> these are the predicted heatmaps for top left, bottom right and center keypoints. Right ?
    • 4th and 5th of size [batchsize, 128, 1] --> what are these ?
    • 6th, 7th and 8th of size [batchsize, 128, 2] --> what are these ?

UmarSpa avatar Nov 13 '19 16:11 UmarSpa

After few hours of digging, got the answers:

  • xs has 4 variables:
    • 1st of size [batchsize, 3, 511, 511] --> this is the input image
    • 2nd, 3rd and 4th of size [batchsize, 128] --> these are the ids of the locations of top-left, bottom-right corners and center keypoints. 128 is the upper threshold of the number of objects that are present in the input image. For ex, if there are 10 objects in the image then the first 10 elements of these tensors will contain some values representing the respective ids, while the rest of tensor will have 0s. (N.B. the range of id values goes from 0 to 16384, since the output space is 128x128)
  • ys has 7 variables:
    • 1st, 2nd and 3rd of size [batchsize, 80, 128, 128] --> these are the ground-truth heatmaps for top-left, bottom-right and center keypoints.
    • 4th of size [batchsize, 128] --> this tensor is a binary mask, indicating the number of objects present in the input image. For ex., if there are 10 objects in the image, then the first 10 elements of this tensor will be 1s, and the rest will be 0s.
    • 5th, 6th and 7th of size [batchsize, 128, 2] --> these are the offsets of top-left, bottom-right corners and center keypoints. For ex, if there are 10 objects in the image, then the first 10 elements of these tensors will contain the x and y offset values, while the rest will be 0s.
  • preds has 8 variables:
    • 1st, 2nd and 3rd of size [batchsize, 80, 128, 128] --> these are the predicted heatmaps for top left, bottom right and center keypoints.
    • 4th and 5th of size [batchsize, 128, 1] --> these are the predicted embeddings of the top-left and bottom-right corners. (N.B. the embedding dimension is 1).
    • 6th, 7th and 8th of size [batchsize, 128, 2] --> these are the predicted offsets of top-left, bottom-right corners and center keypoints.

UmarSpa avatar Nov 14 '19 15:11 UmarSpa