encog-java-core
encog-java-core copied to clipboard
Workbench: Enhance evaluation for NN regression
- In Workbench, create .ega file for NN regression - using data: http://dione.zcu.cz/~toman40/encog/data9.zip
- Execute task-full and "Stop all commands" when task-train begins.
- "Normalize to training"
data9_eval.csv. - Double click
data9_train.eg, click "Train" -> selectdata9_train.egbfor as "Training Set" anddata9_eval.egbas "Validation Set" -> choose RPROP, ensure "Maximum Error Percent" = 3 and let the training run. - Close all tabs (saving the training if a popup appears), double click
data9.egaand execute task-evaluate. - Check
Output:ycolumn indata9_output.csv-> it contains all zeroes. - Delete
data9_output.csvand replacedata9_eval.csvbydata9_train.csv. - Repeat step 5.
- Check
Output:ycolumn indata9_output.csv-> it contains almost all zeroes. However, since the training error was ~2.6%,Output:yshould contain values very close to those inycolumn (the training data are used now).
In contrast, if the steps above are taken using SVM regression, the data9_output.csv contains expected values - i.e. in the latter case, Output:y contains values close to y.
I agree it looks odd, but as near as I can tell this is the data. I tested, and got the same result. But when I look at your data the y column has almost all examples of very low y values. Yet there are a few outliers that push the range much higher. So most values of y normalize to around -1. So the neural network basically learns to always return -1, as that is fairly close for the vast majority. I stepped through the error calculation and and it looked right to me.
To do an independent test I used the Iris dataset, with a neural network and did regression, making it predict petal width, and it worked fine and gave reasonable regression output.
I've tried to use y^y instead of y - just to make the scale larger - then training error = ~15 % and validation error = ~20 %. However, there are all zeroes in Output:y again... Do you have any tips on how to make the output more sensible?
How about adding a validation error output using original data to Workbench? It seems that the training error based on normalized data is not very useful in a case like this - it may be low, but the output is actually (completely) wrong.
Will have to think about that. I think the neural network needs to remain fairly agnostic of normalization. Perhaps creating a new error function that does not simply take a delta between ideal and output, then that could be backpropagated through the training. This is supported in Encog to facilitate arctan error.
I am renaming this issue to "enhance" NN regression and moving it to a future release since this is not really a bug that needs to be addressed in the current release.
Good, that sounds like a plan :-)