self-adaptive-training icon indicating copy to clipboard operation
self-adaptive-training copied to clipboard

tabular data/ noisy instances

Open nazaretl opened this issue 3 years ago • 3 comments

Hi, thanks for sharing your implementation. I have two questions about it:

  1. Does it also work on tabular data?
  2. Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

nazaretl avatar May 09 '22 09:05 nazaretl

Hi,

regarding your questions:

  1. I am not familiar with tabular data, but I think it is worth a try if your goal is classification and you were using the CE loss.
  2. The simplest way, as we did in the paper, is to compare the find the mismatch between maximal indices of the training targets and the original labels. Generally, a mismatch indicates a noisy instance with a high probability.

LayneH avatar May 11 '22 14:05 LayneH

thank you for clarification! By tabular data I mean non-image data e.g. iris dataset

nazaretl avatar May 16 '22 10:05 nazaretl

In my opinion, the data modality is not a crucial problem for SAT.

Say you have an input data $x$ of any modality and a deep model $f(\cdot)$ that produces prediction $p = f(x)$. SAT operates only on the prediction $p$ and $p$ to update the training target, independent of the modality of $x$. As long as your model $f(\cdot)$ is able to overfit the training labels, SAT should help.

LayneH avatar Jun 02 '22 07:06 LayneH