ml5-library icon indicating copy to clipboard operation
ml5-library copied to clipboard

normalizeValue returns NaN when max and min are equal

Open flatstyle opened this issue 4 years ago • 6 comments

Hello, there is an issue that occurs when normalizing data that contains the same max / min values. This can be seen below


  function normalizeValue(value, min, max) {

    return ((value - min) / (max - min))
  }

assert(normalizeValue(1,1,1) === NaN); (1 -1/1-1) = 0/0 = NaN

flatstyle avatar Nov 03 '21 03:11 flatstyle

Hey @flatstyle, thanks for raising this issue. Could you please give us a little bit more context. In which case does this happen? Which ml5 functionalities are you using? Thanks! Tom

tlsaeger avatar Nov 08 '21 09:11 tlsaeger

It happens during the call to normalizeData.

async function initialize() {
  const neuralNetwork = neuralNetworkFn(options, modelLoaded);

  function modelLoaded() {
    console.log("loaded");

    const trainingOptions = {
      epochs: 30,
      batchSize: 64,
      validationSplit: .20
    };

    **neuralNetwork.normalizeData();**

    neuralNetwork.train(trainingOptions, finishedTraining);

    function finishedTraining(epoch, loss) {
      neuralNetwork.save("apt-predict-3");
    }
  }
}
```

flatstyle avatar Nov 08 '21 11:11 flatstyle

Ah, this is indeed an issue! We could override the normalization algorithm and just set all the values to 0.5 (or some other arbitrary value). We might consider issuing a warning that the minimum and maximum range are equal. This I believe would be a sign that the data for that particular input is meaningless (since all the values are the same, there is nothing to learn!).

shiffman avatar Nov 11 '21 02:11 shiffman

Yeah,I think the warning is the best solution. I ended up just returning the value, before just removing the data point.

  normalizeValue(value, min, max) {
    if (max === min) {
      return value;
    }

    return ((value - min) / (max - min))
  }

flatstyle avatar Nov 11 '21 07:11 flatstyle

Hi, I think it would be very useful to fix this rather than returning a warning.

One simple example where the current normalization will break, and the learning data is meaningful, would be a regression with RGB output values, where all blue components happen to be equal.

Maybe looking at the preprocessing.normalize function in scikit-learn (Python) could be an inspiration on how to robustly solve this?

E.g.:

from sklearn import preprocessing

X = [[255.0, 0.0, 0.0],
     [0.0, 255.0, 0.0]]
preprocessing.normalize(X)

will produce

array([[1., 0., 0.], [0., 1., 0.]])

colormotor avatar Dec 09 '22 13:12 colormotor

I did a test by replacing getInputMetaStats in NeuralNetworkData with

getInputMetaStats(dataRaw, inputOrOutputMeta, xsOrYs) {
    const inputMeta = Object.assign({}, inputOrOutputMeta);
    Object.keys(inputMeta).forEach(k => {
      if (inputMeta[k].dtype === 'string') {
        inputMeta[k].min = 0;
        inputMeta[k].max = 1;
      } else if (inputMeta[k].dtype === 'number') {
        const dataAsArray = dataRaw.map(item => item[xsOrYs][k]);
        inputMeta[k].min = nnUtils.getMin(dataAsArray);
        inputMeta[k].max = nnUtils.getMax(dataAsArray);
        if (inputMeta[k].max - inputMeta[k].min < Number.EPSILON*10){
          inputMeta[k].max = inputMeta[k].min + 1;
        }
      } else if (inputMeta[k].dtype === 'array') {
        const dataAsArray = dataRaw.map(item => item[xsOrYs][k]).flat();
        inputMeta[k].min = nnUtils.getMin(dataAsArray);
        inputMeta[k].max = nnUtils.getMax(dataAsArray);
        if (inputMeta[k].max - inputMeta[k].min < Number.EPSILON*10){
          inputMeta[k].max = inputMeta[k].min + 1;
        }
      }
    });

    return inputMeta;
}

and it seems to do the trick with the current min/max normalization approach. Happy to create a pull request if useful.

colormotor avatar Dec 14 '22 16:12 colormotor