facemesh.pytorch
facemesh.pytorch copied to clipboard
about pth results differ from orignal tfmodels
firstly, thanks for your excellent work, but I found torch result is slightly different from the original tflite model:
tensorflow and torch version:
(Pdb) tf.__version__
'2.2.1'
(Pdb) torch.__version__
'1.5.1'
test code:
import os
import numpy as np
import tensorflow as tf
import torch
import torch.nn as nn
from facemesh import FaceMesh
import cv2
sample_img = cv2.imread("test.jpg")
sample_img_192 = cv2.resize(sample_img, (192, 192))
input_data = np.expand_dims(sample_img_192, axis=0).astype(np.float32) / 127.5 - 1.0
interpreter = tf.lite.Interpreter(model_path="facemesh-lite.f16.tflite")
interpreter.allocate_tensors()
net = FaceMesh()
net.load_weights("facemesh.pth")
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
# input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
# tf inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
tf_coord_res = interpreter.get_tensor(output_details[0]['index'])
# torch inference
torch_output_data = net(torch.from_numpy(input_data.transpose(0, 3, 1, 2)))
torch_coord_res = torch_output_data[0].detach().numpy()
print(["torch", torch_coord_res[0]])
print(["tflite", tf_coord_res[0, 0, 0]])
print("diff %f" % (np.abs(torch_coord_res[0] - tf_coord_res[0, 0, 0]).mean()))
results:
#==>
['torch', array([ 94.1816 , 140.77983 , -14.322037, ..., 136.51678 , 88.71278 ,
6.525924], dtype=float32)]
['tflite', array([ 92.17496 , 139.39285 , -14.361812, ..., 134.26816 , 87.34091 ,
5.629876], dtype=float32)]
diff 1.167073
I think this may caused by the option conv2d("padding=same" ) different from tensorflow, have you fixed this problem or some advise?
thanks!
How do you solve this problem? I find this problem too. It can be solved by changing this https://github.com/thepowerfuldeez/facemesh.pytorch/blob/348400fe32c60111a29e9e6891e230c0005ddd8a/facemesh.py#L114 to
x = nn.ConstantPad2d((0, 1, 0, 1), 0)(x)
You are right about the different by 'same' padding in pytorch and tensorflow.
However, the result is still slightly different from original mediapipe output. We need to further looking at the mediapipe source code.