setfit icon indicating copy to clipboard operation
setfit copied to clipboard

Does num_iterations create duplicate data?

Open nsorros opened this issue 3 years ago • 0 comments

I am trying to get a better understanding behind this hyperparam. As far as I understand, you are iterating over the data num_iterations times and create a positive and negative pair by sampling. Could this result in duplicate data?

Also sometimes it tends to result in more examples than potential pairs for example in imdb for 3 shot there are 6 examples, 2 per class. Setting num_iterations to 5 creates 6 (examples) * 2 (1 positive + 1 negative) * 5 (num_iterations) = 60 examples. The possible combinations though are 6*6/2-6 = 12, essentially half of the matrix of all pairs without the diagonal.

If the above is correct it seems that its like running training for multiple epochs. Is that right? If so, why are you not creating all pairs instead and keep the epochs hyperparam as is which might be more intuitive. If you want a way to sample less data, why not introduce a sample_size to cap those combinations to a lesser number for experimentation?

nsorros avatar Oct 26 '22 13:10 nsorros