dsntnn Question when I use dsnt in my net

I have trained a network that obtains key points of the face by supervising the generation of heatmaps.The network uses the max operation to obtain 68 key point coordinates of the face from the key point heat map with 68 channels output by FCN. At present I want to combine this network with another network to train together, but the max operation used before is not differentiable, so I want to replace the max operation with dsnt. So I use batch_location_dsnt = dsntnn.dsnt(heatmap) （the heatmap is obtained by FCN, it's a 68 * 1 * 16 * 16 tensor） but the batch_location_dsnt I obtained is

`tensor([[[ -0.5989, -0.2222]],

    [[ -0.6683,  -0.0225]],

    [[ -0.7003,   0.1874]],

    [[ -0.7120,   0.5027]],

    [[ -0.6451,   0.7451]],

    [[ -0.5105,   1.0081]],

    [[ -0.4522,   1.1898]],

    [[ -0.2934,   1.2817]],

    [[ -0.0759,   0.9567]],

    [[  0.1304,   1.0607]],

    [[  0.3462,   1.4314]],

    [[  0.7308,   1.3509]],

    [[  0.8871,   1.0625]],

    [[  1.1645,   0.7980]],

    [[  1.4735,   0.5973]],

    [[  1.3658,   0.1797]],

    [[  1.2114,  -0.1012]],

    [[ -0.7434,  -0.7085]],

    [[ -0.6286,  -0.7392]],

    [[ -0.4630,  -0.7343]],

    [[ -0.2988,  -0.6485]],

    [[ -0.1515,  -0.5185]],

    [[  0.0185,  -0.5908]],

    [[  0.3039,  -0.6446]],

    [[  0.5553,  -0.6704]],

    [[  0.8032,  -0.6359]],

    [[  0.9848,  -0.4610]],

    [[ -0.1231,  -0.3595]],

    [[ -0.2189,  -0.2581]],

    [[ -0.2404,  -0.0784]],

    [[ -0.3306,   0.1073]],

    [[ -0.4281,   0.2564]],

    [[ -0.3071,   0.3424]],

    [[ -0.2748,   0.3945]],

    [[ -0.1277,   0.3686]],

    [[  0.0404,   0.3399]],

    [[ -0.5630,  -0.4150]],

    [[ -0.4809,  -0.4761]],

    [[ -0.3541,  -0.4953]],

    [[ -0.2261,  -0.3877]],

    [[ -0.4000,  -0.3473]],

    [[ -0.5188,  -0.3881]],

    [[  0.2428,  -0.3442]],

    [[  0.4070,  -0.3346]],

    [[  0.5273,  -0.3868]],

    [[  0.7190,  -0.2441]],

    [[  0.5536,  -0.2888]],

    [[  0.4207,  -0.2777]],

    [[ -0.3997,   0.7421]],

    [[ -0.3004,   0.5801]],

    [[ -0.3018,   0.5292]],

    [[ -0.1713,   0.4833]],

    [[ -0.0893,   0.4787]],

    [[  0.0906,   0.6432]],

    [[  0.3095,   0.7009]],

    [[  0.1567,   0.8734]],

    [[ -0.0456,   1.1209]],

    [[ -0.1621,   1.0680]],

    [[ -0.2678,   1.0100]],

    [[ -0.3905,   0.8635]],

    [[ -0.3840,   0.7459]],

    [[ -0.2615,   0.6243]],

    [[ -0.1569,   0.5345]],

    [[ -0.1064,   0.6030]],

    [[  0.2071,   0.6364]],

    [[ -0.0748,   0.8947]],

    [[ -0.1838,   0.7509]],

    [[ -0.2617,   0.8739]]], device='cuda:0', grad_fn=<CatBackward>)`

Obviously，[-0.5989, -0.2222] doesn't look like coordinates，Why is dsnt not outputting the maximum x and y coordinates like the max operation? How can I get the correct coordinates of the key points?

Nov 04 '20 10:11 QcQcM

Please read the basic usage guide.

Importantly, the target coordinates are normalized so that they are in the range (-1, 1). The DSNT layer always outputs coordinates in this range.

You can use the image size to convert from normalized coordinates to pixel coordinates.

Now, I can also see that you have some coordinates which are slightly outside of the (-1, 1) range. This implies to me that you have not normalized the heatmaps (e.g. using dsntnn.flat_softmax).

Nov 04 '20 22:11 anibali

Thank you for your prompt reply. In fact, I normalized the heatmap with heatmap = dsntnn.flat_softmax(heatmap) based on the example. Maybe because my torch is 1.2.0? I found that you mentioned in the answer to other people before. Another question I want to ask maybe stupid, how can convert the normalized coordinates to the pixel coordinates? When I was reading the examples in the paper, I didn't quite understand how the x=0.4, y=0 finally got back to the original coordinates. 2020-11-06 09-09-23 的屏幕截图 Is the final coordinate the intersection of the column with the value 0.4 in the X matrix and the row with the value 0 in the Y? So we know the value of X is 0.4, the value of n is 5, and beacuse Xij = (2j -(n+1))/n (As shown below) ，we can know the value of j is 4，the same reason, the value of i is 3 ,so the pixel coordinate is (4，3) ？third row and fourth column 2020-11-06 09-09-35 的屏幕截图 Should I use the same method to restore the results obtained by dsnt to pixel coordinates? Thank you.

Nov 06 '20 01:11 QcQcM

In fact, I normalized the heatmap with heatmap = dsntnn.flat_softmax(heatmap) based on the example. Maybe because my torch is 1.2.0? I found that you mentioned in the answer to other people before.

If you use dsnt directly after flat_softmax, it should not be possible for values to appear outside of the (-1, 1) range.

Another question I want to ask maybe stupid, how can convert the normalized coordinates to the pixel coordinates?

There's a function that will do the conversion for you: https://github.com/anibali/dsntnn/blob/779631f80ad13da6332276b69acfc04669e49cbe/dsntnn/init.py#L322-L334

Your understanding of how the conversion works seems to be correct.

Nov 06 '20 01:11 anibali

I use pip install dsntnn==0.4.0a0 to change the version of dsnt

the code used dsnt is heatmap = dsntnn.flat_softmax(heatmap) batch_location_dsnt = dsntnn.dsnt(heatmap)

the value of batch_location_dsnt I get is `tensor([[[-7.8823e-04, 4.4607e-04]],

    [[-7.1920e-04,  1.8634e-03]],

    [[-6.0907e-04,  3.9498e-03]],

    [[-4.3593e-04,  5.8416e-03]],

    [[ 8.3447e-05,  7.0462e-03]],

    [[ 4.7119e-04,  5.1756e-03]],

    [[ 9.2238e-05,  4.8673e-04]],

    [[ 4.5002e-06,  1.5318e-04]],

    [[ 1.6151e-04,  1.1787e-04]],

    [[ 2.4366e-04,  1.8312e-04]],

    [[ 1.7077e-04,  1.4871e-04]],

    [[ 1.0362e-03,  1.1041e-03]],

    [[ 1.0263e-03,  1.1403e-03]],

    [[ 1.0668e-04,  7.3150e-05]],

    [[ 6.6787e-05,  1.5837e-04]],

    [[-6.5744e-05,  6.1721e-05]],

    [[ 2.0477e-04,  7.8917e-05]],

    [[-4.1145e-04,  1.0529e-04]],

    [[-1.0380e-04, -7.3744e-04]],

    [[ 6.6893e-04, -1.2747e-03]],

    [[ 1.6504e-03, -8.9851e-04]],

    [[ 2.5342e-03, -2.5265e-04]],

    [[ 3.0837e-03, -8.1700e-04]],

    [[ 4.1877e-03, -1.2371e-03]],

    [[ 5.6369e-03, -1.4836e-03]],

    [[ 6.1638e-03, -9.8917e-04]],

    [[ 5.0859e-03, -2.0368e-04]],

    [[ 2.6908e-03,  2.7435e-04]],

    [[ 2.9002e-03,  1.1736e-03]],

    [[ 2.6792e-03,  2.1132e-03]],

    [[ 2.5727e-03,  2.8178e-03]],

    [[ 1.8448e-03,  3.5268e-03]],

    [[ 2.7345e-03,  4.2092e-03]],

    [[ 2.6624e-03,  3.8066e-03]],

    [[ 3.3309e-03,  4.0315e-03]],

    [[ 4.3377e-03,  4.2360e-03]],

    [[ 7.3338e-04,  6.7948e-04]],

    [[ 8.0203e-04, -4.6089e-05]],

    [[ 1.5023e-03,  3.3472e-04]],

    [[ 1.8472e-03,  1.5168e-04]],

    [[ 1.5913e-03,  3.4170e-04]],

    [[ 9.1051e-04,  3.3060e-04]],

    [[ 4.7607e-03,  3.5889e-04]],

    [[ 4.5108e-03,  2.9832e-05]],

    [[ 4.9918e-03, -1.3429e-04]],

    [[ 6.7976e-03,  4.4054e-04]],

    [[ 5.4981e-03,  5.2540e-04]],

    [[ 5.6122e-03,  4.2932e-04]],

    [[ 1.7819e-03,  7.3131e-03]],

    [[ 1.6770e-03,  5.3437e-03]],

    [[ 2.6431e-03,  5.7792e-03]],

    [[ 2.8788e-03,  5.6338e-03]],

    [[ 3.3270e-03,  5.2575e-03]],

    [[ 3.8019e-03,  5.3061e-03]],

    [[ 4.7200e-03,  6.3319e-03]],

    [[ 3.4578e-03,  6.0037e-03]],

    [[ 3.2287e-03,  6.9463e-03]],

    [[ 2.3241e-03,  6.2275e-03]],

    [[ 2.1963e-03,  6.6319e-03]],

    [[ 1.7331e-03,  6.3944e-03]],

    [[ 2.2274e-03,  7.5258e-03]],

    [[ 2.3830e-03,  5.9583e-03]],

    [[ 2.6063e-03,  5.4184e-03]],

    [[ 3.0387e-03,  5.4277e-03]],

    [[ 4.6358e-03,  6.6240e-03]],

    [[ 3.1824e-03,  6.5282e-03]],

    [[ 2.9334e-03,  6.8267e-03]],

    [[ 2.4876e-03,  6.8808e-03]],

    [[-4.4794e-02, -4.6768e-02]]], device='cuda:0', grad_fn=<FlipBackward>)`

does the output look right now ? Thank you,looking forward to your reply. Today is also a day to work hard！！！best wishes!

Nov 06 '20 02:11 QcQcM

does the output look right now ?

The output is valid, but I can't say whether they are the right answers for your problem :wink:

Nov 06 '20 03:11 anibali

the first two output of dsnt:

[[ 2.3011e-03,  4.8983e-03]],

        [[ 2.5807e-03,  5.0080e-03]],

the first two coordnite output of dsnt(return of function normalized_to_pixel_coordinates):

tensor([[[7.4616, 7.5037]],

        [[7.4597, 7.5175]],

the result of finding the maximum value: index of w: 2., 2., index of h: 8., 10. which means the point coordnites is (2,8) (2,10)

**the first two of the 68 heatmaps 2020-11-06 14-53-16 的屏幕截图

2020-11-06 14-53-23 的屏幕截图

As u can see,the result of finding the maximum value is not same as the dsnt :sob: :sob: :sob: Is it because I am now directly using dsnt to find the value on the heatmap previously trained without dsnt? Do I need to retrain with dsnt?

Nov 06 '20 08:11 QcQcM

Is it because I am now directly using dsnt to find the value on the heatmap previously trained without dsnt? Do I need to retrain with dsnt?

I didn't realise you were trying to avoid retraining. I think that the problem you're having is that if you don't retrain, flat_softmax will cause the heatmap values to "flatten" which biases dsnt towards the centre of the image.

Your options are:

Retrain.
Instead of using flat_softmax, try normalising the heatmaps by subtracting the minimum value and dividing by the sum:

heatmap -= heatmap.flatten(-2).min(-1)[0][..., None, None]
heatmap /= heatmap.sum([-1, -2], keepdim=True)

You may as well try 2), and then if that doesn't work, try 1).

Nov 06 '20 08:11 anibali

I have to say, you are so kind!!! I have tried 2), it seems like better than before ,but still far away from right result,and I realize the result when heatmap is not normalized is better than normalized. I will try to retrain the net, and pls looking forward to my good new!:stuck_out_tongue::stuck_out_tongue::stuck_out_tongue: nice to meet u!

Nov 06 '20 09:11 QcQcM