StoryViz
StoryViz copied to clipboard
Evaluation metrics
Hello, I hope your research goes well. 😀
I am trying to evaluate the metrics that you proposed for my model.
I have read your paper. However, I am asking you to double-check. (my results seem a bit odd and off the scale, that's why 😢)
- I presume that the "character F1" score represents the "micro avg" of F1 score outputs from your
eval_clasifier.py
code? Am I correct? - also, "Frame accuracy" represents "eval Image Exact Match Acc" outputs from your
eval_classifier.py
code? - are BLEU 2 and BLEU 3 scores scaled by 100? I have tested your
translate.py
code with my generated images, and I've got about 0.04-ish scores. are the BLEU scores you reported multiplied by 100? - Lastly, It is unclear about the R-precision evaluation method. Do I require to train your code (H-DAMSM)? if so, when is the right time to stop the training and benchmark my model?
- To fair comparison, is it possible to be provided your H-DAMSM pretrained weight?
I am currently stuck on the R-precision evaluation using H-DAMSM. So, I was thinking of utilizing the recent CLIP R-Precision instead, but I am leaving this issue to avoid a fair comparison issue.