disco
disco copied to clipboard
Support (decentralized) Normalization for Tabular Datasets
For tabular datasets (popular examples: adult income and titanic), normalization is critical for neural network approaches.
The most typical and a very effective way to normalize is to "subtract the mean and divide by the standard deviation". However, computing these in a decentralized fashion is non-trivial. For DeAI to support this, additional functionality needs to be implemented.
Examples of how this can be addressed:
- Provide means and standard deviations for all features based on some a-priori knowledge. Each participant is then asked to normalize their data according to this standard before uploading.
- Learn means and standard deviations as a pre-learning task, which is then automatically applied to each local dataset. This could be a full DeAI training cycle, or a simple weighted average which is democratically communicated.