Results computation
Hello, and thank you for the great paper!
I was wondering if you could share the script you used to obtain the results presented in the paper, or alternatively, the specific hyperparameters and test details. This would be really helpful for reproducing your results and better understanding the methodology.
Thanks in advance!
Hi, to reproduce the results, you can use the following hyperparameters:
num_inference_steps: 200
guidance_scale: 4.0
image size (width x height): 512 x 512
For the anonymization_degree, we reported results using values of 1.2 and 1.4 in the paper. You can pass these parameters to the pipeline outlined in the README to replicate our results.
Hi, thank you for your kind and timely response!
I have a few follow-up questions I hope you can help clarify:
Did you use a specific random seed during evaluation?
Regarding the quaternion angular distance used for both face and gaze estimation: did you assume the roll angle to be 0? Since L2CS provides only two angles, I'm trying to understand how this distance was computed with just those values.
Lastly, about the IQA metric which specific implementation did you use? The paper you referenced links to a GitHub repository that provides several versions of the TopIQ metric. From what I can tell, you likely used topiq_nr, but I'd appreciate a confirmation.
Thanks again for your time and support. Wishing you a great day!
Unfortunately, I don’t recall the exact random seed I used during evaluation, but I think it shouldn't affect the overall quantitative results very much.
For the angular distance in gaze estimation using L2CS, yes, I assigned 0 to the roll angle.
For the Face IQA metric, the topiq_nr-face method was used.
Hi! Sorry to trouble you again, but I have a few doubts I haven’t been able to resolve on my own.
For the pose estimation model used in your validation, could you clarify which pretrained weights you used? Was it the "300W-LP, alpha 1, robust to image quality" setting?
Also, regarding the expression and shape model: the GitHub repository linked in the paper you cited appears to use a ResNet to estimate multiple coefficients. I beleive I’ve correctly identified the expression vector (a 64-dimensional vector), but I’m having difficulty determining which coefficients you used to compute the shape distance.
Finally, did you apply any preprocessing while computing metrics with these models? The shape and expression model for example expects 224x224 face crops, did you just resize the input image or did you do something different?
Thank you again for your time and patience!
Hi! No problem at all.
Yes, we used the "300W-LP, alpha 1, robust to image quality" pretrained weights.
The *.mat files generated by running test.py from Deep3DFaceRecon_pytorch are dictionary objects with keys such as "id" and "exp," which correspond to face shape and expressions. You are correct that the expression vector is 64-dimensional, while the face shape vector is 80-dimensional.
No preprocessing was applied before computing these coefficients; we simply ran test.py with the generated face images.