LyCORIS DinoV2 similarity score calculation

Dear authors,

Thank you for the amazing works, could you give the implementation of Subject Fidelity usign DINOv2 and Prompt fidelity? I'd like to replicate the paper results. (https://arxiv.org/pdf/2309.14859) Once again, thank you for the great contribution

Aug 08 '24 09:08 Sundragon1993

@Sundragon1993 Sorry for late reply

we have a repo for our eval framework: https://github.com/cyber-meow/LyCORIS-evaluation

you may want to check it

Aug 13 '24 06:08 KohakuBlueleaf

@KohakuBlueleaf

Thank you for the information, I've tried to evaluate the Lora using your eval framework but getting the error due to the lack of:

 in_dist_features = "in_dist_prompts-image-features.npz"
    out_dist_features = "out_dist_prompts-image-features.npz"
    trigger_features = "triggeronly-image-features.npz"

in this file: https://github.com/cyber-meow/LyCORIS-evaluation/blob/7c5758c8338aa59c4601bb68db8d5112b15748c4/lycoris_eval/evaluation/utils.py#L60 How do I prepare the ref image to get these .npz files?

Aug 14 '24 09:08 Sundragon1993

cc @cyber-meow

BTW I recommend you to open issue to LyCORIS-evaluation repo instead here

Aug 14 '24 09:08 KohakuBlueleaf

@Sundragon1993

Thank you for your interest and I am sorry that I did not make the repository more user-friendly. I planned to polish it at some moment but did not have time.

The npz files come from https://github.com/cyber-meow/LyCORIS-evaluation/blob/main/lycoris_eval/encoding/encode_image_features.py More specifically, we had the the directory organized in a certain way, with folders such as in_dist_prompts, out_dist_prompts, and triggeronly at the last level and this is why we get these specific npz files.

Feel free to clone/fork the repository and modify the code yourself as it was not so flexible.

Aug 14 '24 09:08 cyber-meow

@cyber-meow Thank you for the prompt reply, should I put the same image into these folders?

Aug 14 '24 10:08 Sundragon1993

I expected if we compute the similarity of only 1 image, the score should be 1:

Aug 14 '24 10:08 Sundragon1993

What images to put in each folder ultimately depends on your choice. This would just affect the score you obtain in your csv file. Moreover I think you need to use the --generated argument here for the generated images to get the right npz files. You will also need to encode npz for some reference image. If you compute the cosine similarity between two images that are identical then you get a score of 1. Otherwise even for very similar images we typically expect a score between 0.3 and 0.6 depending on the exact clip model we are using.

Aug 14 '24 11:08 cyber-meow

@cyber-meow I'm not sure how to determine what images are in_dis_prompt and out_dis_prompt or triggeronly with the ground truth images. I have several real ground truth(ref) images from the artist, and I want to quantitatively evaluate trained LoRAs or LyCoris using these images and generated images. As far as I concern, your current code logic is trying to evaluate between generated ref images and generated images between different types of LoRA only. Correct me if I'm wrong. Thank you

Aug 14 '24 16:08 Sundragon1993

The current logic is to evaluate between ref images and generated images. Ref images are put under their own folder and encoded without --generated. Generated images are organized into several folders (as this is how the experiments were designed) and encoded with --generated argument. The final csv will compute the metric for images within each folder and put them in each column as shown in the image below.

Therefore, you can put any generated images in these folders. In the case of our experiments, in_dist_prompts indicates images generated with training prompts, out_dist_prompts indicates images generated with prompts that requires generalization outside fine-tuning set, and triggeronly indicates images that are generated when only the trigger word is supplied.

Aug 15 '24 03:08 cyber-meow