speech2affective_gestures icon indicating copy to clipboard operation
speech2affective_gestures copied to clipboard

Evaluate the quantitative performance on the Dataset

Open Shedima opened this issue 2 years ago • 7 comments

How would you evaluate the quantitative performance of your model on the genea_challenge_2020 dataset? I only found the code for evaluation on the TED dataset.

Shedima avatar Jul 10 '22 16:07 Shedima

You can find the method generate_gestures_by_dataset in processor_v2.py, which provides the generation of the GENEA dataset in addition to TED. The quantitative metrics for GENEA were evaluated by hand following the code inside generate_gestures in processor_v2.py.

UttaranB127 avatar Jul 10 '22 17:07 UttaranB127

Do you mean to use generate_gestures_by_dataset to generate a sequence of poses and then evaluate quantitative metrics manually with generate_gestures?

Shedima avatar Jul 11 '22 05:07 Shedima

Yes

UttaranB127 avatar Jul 11 '22 05:07 UttaranB127

Hello, I followed the previously stated method to evaluate on the GENEA dataset and found that the FGD evaluation metrics are very high. According to the method in the source code, I generated the corresponding video and found that the generated one is completely different from the real pose. I guess this is the reason for the high FGD evaluation metrics. So can you please provide a complete methodology to evaluate it on the GENEA dataset. 2022-08-04 20-09-22 的屏幕截图

Shedima avatar Aug 04 '22 12:08 Shedima

One thing I notice is that the arms in the GT seem to be vertically inverted (along the y-axis). I think that the evaluation is adding some vertical flipping for both the GT and the predicted, but it might not be required for the GT. Could you try to evaluate and visualize by inverting back the y-axis values of the GT? Apart from that, we had used the same error terms to evaluate on GENEA as on the TED dataset.

UttaranB127 avatar Aug 04 '22 16:08 UttaranB127

I also found that the arm in GT is inverted vertically, I tried to reverse the value of the Y axis but found that the FGD is still very high. Although I am following the evaluation and visualization in the source code you provided. I can't get the evaluation data in your paper on GENEA dataset, so can you provide the source code that you evaluated on GENEA dataset?

Shedima avatar Aug 05 '22 01:08 Shedima

My apologies, but I currently don't have access to that evaluation code, not sure how soon I will be able to retrieve and access it. Meanwhile, you can report the higher numbers if you followed the same evaluation methodology as for the TED dataset.

UttaranB127 avatar Aug 05 '22 04:08 UttaranB127