LASTED icon indicating copy to clipboard operation
LASTED copied to clipboard

Do you have a paper/technical report to refer to more implementation details?

Open xiankgx opened this issue 1 year ago • 1 comments

It seems like you are using CLIP with 4 possible textual description and then use cosine similarity for classification, just like CLIP. However, unlike CLIP where the cardinality of the labels, i.e., number of possible text sentences is practically unlimited (in training at least), whereas in LASTED it is only 4. I wonder how much of an uplift is there if we are to train on the same CLIP image encoder, LASTED vs something like just adding a regressor head on top of CLIP image encoder using standard multi-class categorical class entropy loss.

xiankgx avatar Mar 29 '24 01:03 xiankgx

I also wonder what happens if you augment the labels during training. For e.g., an AI image could be randomly selected from say:

  • ai gen
  • ai generated image
  • fake
  • fakes
  • fake image
  • fake photo
  • computer generated image
  • Midjourney/Stable Diffusion/Dalle generated image <- if you have the labels of which model generated the image
  • deep fake

Perhaps something would make use of the text-modality a little more to boost performance?

xiankgx avatar Mar 29 '24 01:03 xiankgx