ml5-library
ml5-library copied to clipboard
normalizeValue returns NaN when max and min are equal
Hello, there is an issue that occurs when normalizing data that contains the same max / min values. This can be seen below
function normalizeValue(value, min, max) {
return ((value - min) / (max - min))
}
assert(normalizeValue(1,1,1) === NaN); (1 -1/1-1) = 0/0 = NaN
Hey @flatstyle, thanks for raising this issue. Could you please give us a little bit more context. In which case does this happen? Which ml5 functionalities are you using? Thanks! Tom
It happens during the call to normalizeData.
async function initialize() {
const neuralNetwork = neuralNetworkFn(options, modelLoaded);
function modelLoaded() {
console.log("loaded");
const trainingOptions = {
epochs: 30,
batchSize: 64,
validationSplit: .20
};
**neuralNetwork.normalizeData();**
neuralNetwork.train(trainingOptions, finishedTraining);
function finishedTraining(epoch, loss) {
neuralNetwork.save("apt-predict-3");
}
}
}
```
Ah, this is indeed an issue! We could override the normalization algorithm and just set all the values to 0.5 (or some other arbitrary value). We might consider issuing a warning that the minimum and maximum range are equal. This I believe would be a sign that the data for that particular input is meaningless (since all the values are the same, there is nothing to learn!).
Yeah,I think the warning is the best solution. I ended up just returning the value, before just removing the data point.
normalizeValue(value, min, max) {
if (max === min) {
return value;
}
return ((value - min) / (max - min))
}
Hi, I think it would be very useful to fix this rather than returning a warning.
One simple example where the current normalization will break, and the learning data is meaningful, would be a regression with RGB output values, where all blue components happen to be equal.
Maybe looking at the preprocessing.normalize function in scikit-learn (Python) could be an inspiration on how to robustly solve this?
E.g.:
from sklearn import preprocessing
X = [[255.0, 0.0, 0.0],
[0.0, 255.0, 0.0]]
preprocessing.normalize(X)
will produce
array([[1., 0., 0.], [0., 1., 0.]])
I did a test by replacing getInputMetaStats in NeuralNetworkData with
getInputMetaStats(dataRaw, inputOrOutputMeta, xsOrYs) {
const inputMeta = Object.assign({}, inputOrOutputMeta);
Object.keys(inputMeta).forEach(k => {
if (inputMeta[k].dtype === 'string') {
inputMeta[k].min = 0;
inputMeta[k].max = 1;
} else if (inputMeta[k].dtype === 'number') {
const dataAsArray = dataRaw.map(item => item[xsOrYs][k]);
inputMeta[k].min = nnUtils.getMin(dataAsArray);
inputMeta[k].max = nnUtils.getMax(dataAsArray);
if (inputMeta[k].max - inputMeta[k].min < Number.EPSILON*10){
inputMeta[k].max = inputMeta[k].min + 1;
}
} else if (inputMeta[k].dtype === 'array') {
const dataAsArray = dataRaw.map(item => item[xsOrYs][k]).flat();
inputMeta[k].min = nnUtils.getMin(dataAsArray);
inputMeta[k].max = nnUtils.getMax(dataAsArray);
if (inputMeta[k].max - inputMeta[k].min < Number.EPSILON*10){
inputMeta[k].max = inputMeta[k].min + 1;
}
}
});
return inputMeta;
}
and it seems to do the trick with the current min/max normalization approach. Happy to create a pull request if useful.