the-algorithm icon indicating copy to clipboard operation
the-algorithm copied to clipboard

Can you provide the dataset used for training the trust and safety models

Open utkarsh-aryan opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. The dataset used for the filters could introduce bias

Describe the solution you'd like The datasets should also be open sourced, so that 3rd parties can verify that it is bias free.

utkarsh-aryan avatar Apr 02 '23 08:04 utkarsh-aryan

I second this.

Roblinks avatar Apr 02 '23 15:04 Roblinks

suppose you have 2 data sets:

  • one ends up generating $x advertising revenue
  • one generates 2$x advertising revenue do you really care if one is more or less biased?

m-a-sch avatar Apr 03 '23 06:04 m-a-sch

Highly doubt it, since no AI model of worth provides its training dataset (because it's full of legal troubles). Just think about the things like EU Right to be forgotten. Do you think AI companies retrain their model from scratch on multi-terrabyte dataset each time a EU citizen asks to remove his data from the dataset?

GabenGar avatar Apr 03 '23 16:04 GabenGar