PLC icon indicating copy to clipboard operation
PLC copied to clipboard

What is the impact of sampling y_x ~ n(x)?

Open gordon-lim opened this issue 8 months ago • 0 comments

Thank you for your paper. I'm studying your paper and was hoping you could help enlighten me on the following:

n(x) is fitted to the data and as you've noted "y_x is sampled from n(x), which has quite an extreme confidence". These plus that your explicit assumption that y_x is u_x makes me wonder why there is a need to sample y_x in the first place. I dug into the code and found the following block where you sampled y_x:

#cifar/gen_noise_label.py
y_syn = []
for etaval in eta:
    y_temp = torch.multinomial(etaval, 1)
    y_syn.append(int(y_temp))
y_syn = np.array(y_syn).squeeze()

then those sampled y_x were used to replace the dataset's raw labels here:

trainset.update_corrupted_label(y_syn.copy())

However, the noise generation did not use the y_syn labels at all. In fact you noted this in the paper as well: "For each datum x, we only flip it to s_x or keep it as u_x". In other words, under no circumstance would you use the y_syn labels.

  1. If you only needed to mention the idea of sampling y_x to make the theoretical setup of a Bayes optimal classifier, why actually write that in the code if it won't be used. Is it just to demonstrate the theoretical setup?
  2. Otherwise, why did we even have to sample y_x in the first place?

If you would go further to help me see why we need to sample y_x to setup a Bayes optimal classifier, I would deeply appreciate it.

Thank you!

gordon-lim avatar Jun 26 '24 15:06 gordon-lim