disco icon indicating copy to clipboard operation
disco copied to clipboard

Support (decentralized) Normalization for Tabular Datasets

Open davidroschewitz opened this issue 4 years ago • 0 comments

For tabular datasets (popular examples: adult income and titanic), normalization is critical for neural network approaches.

The most typical and a very effective way to normalize is to "subtract the mean and divide by the standard deviation". However, computing these in a decentralized fashion is non-trivial. For DeAI to support this, additional functionality needs to be implemented.

Examples of how this can be addressed:

  • Provide means and standard deviations for all features based on some a-priori knowledge. Each participant is then asked to normalize their data according to this standard before uploading.
  • Learn means and standard deviations as a pre-learning task, which is then automatically applied to each local dataset. This could be a full DeAI training cycle, or a simple weighted average which is democratically communicated.

davidroschewitz avatar Mar 04 '21 16:03 davidroschewitz