align_sd icon indicating copy to clipboard operation
align_sd copied to clipboard

Regarding the validity of the Human Preference Classifier

Open Hritikbansal opened this issue 1 year ago • 1 comments

Hi @tgxs002 , thanks for your work, and making the dataset and classifier open-sourced!

As a sanity check, I evaluated your trained HPC on the examples in the training data that are preferred by humans (S1), and the examples in the training data that are unpreferred by humans (S2).

I found that the average HPS in the setting S1 ~ 21.0 whereas the HPC in the setting S2 ~ 20.26. For a good classifier, I was hoping that the scores in the setting S2 be very low as compared to S1, but it is not the case. Does it mean that the HPC is not trained properly, but it seems contradictory because the paper claims that the HPC has good agreement with humans?

And if my evaluation numbers look too off, can you let me know what you are getting at your end?

Hritikbansal avatar Apr 05 '23 23:04 Hritikbansal

Hi Hritkbansal,

Thank you for your interest in our work! Based on the result you reported, the evaluation should have been adequately conducted.

The reasons that the HPSs are close between S1 and S2 are:

  1. HPC is finetuned from CLIP, and HPS is defined as the cosine similarity between the image feature and the text feature of the tuned model. Since the cosine similarity of the CLIP model has a bias (not centered at 0), the logit distribution of HPC is also influenced by that. You can easily verify this by Fig. 6 of our paper.
  2. Human preference is very diverse for different individuals, so the score distribution is relatively flat compared with traditional classification tasks.

We are currently working on studying the effectiveness of HPC on a broader range of image distributions, stay tuned!

tgxs002 avatar Apr 06 '23 04:04 tgxs002