numl
numl copied to clipboard
Unexplained IndexOutOfRangeException
Hi Seth,
Had an idea to do a REALLY simple attempt to learn a function that I would have ordinarily implemented as a switch statement, just for the mind-bending. :-)
The code was written to be run in LinqPad.
void Main()
{
Assembly.GetAssembly(typeof(Learner)).Dump();
var gen = new numl.Supervised.NeuralNetwork.NeuralNetworkGenerator();
gen.Descriptor = Descriptor.Create<WindDirection>();
var learned = Learner.Learn(WindDirection.TrainingData(), 16/20, 1, gen);
var model = learned.Model;
var accuracy = learned.Accuracy.Dump();
var windDir = new WindDirection(350, null);
model.Predict(windDir); //Uncomment this if you are running this in LinqPad .Dump("Prediction");
}
// Define other methods and classes here
public class WindDirection {
[Feature]
public double Degrees { get; set; }
[StringLabel()]
public String Direction { get; set; }
public WindDirection(double degrees, string direction)
{
this.Degrees = degrees;
this.Direction = direction;
}
public static WindDirection[] TrainingData()
{
return new[] {
// Training Values
new WindDirection(0, "N" ),
new WindDirection(22.5, "NNE"),
new WindDirection(45, "NE" ),
new WindDirection(67.5, "ENE"),
new WindDirection(90, "E" ),
new WindDirection(112.5, "ESE"),
new WindDirection(135, "SE" ),
new WindDirection(157.5, "SSE"),
new WindDirection(180, "S" ),
new WindDirection(202.5, "SSW"),
new WindDirection(225, "SW" ),
new WindDirection(247.5, "WSW"),
new WindDirection(270, "W" ),
new WindDirection(292.5, "WNW"),
new WindDirection(315, "NW" ),
new WindDirection(337.5, "NNW"),
// Testing Values
new WindDirection(22.5, "NNE"),
new WindDirection(112.5, "ESE"),
new WindDirection(11.25, "N"),
new WindDirection(359-11.25, "N")
};
}
}
However, running the above Main function results in the following IndexOutOfRangeException.
at numl.Model.StringProperty.Convert(Double val) in c:\projects\numl\numl\Model\StringProperty.cs:line 109
at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable`1 examples, Double trainingPct) in c:\projects\numl\numl\Learner.cs:line 169
at numl.Learner.<>c__DisplayClasse.<Learn>b__d(Int32 i) in c:\projects\numl\numl\Learner.cs:line 110
at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass11.<ExecuteSelfReplicating>b__10(Object param0)
I have looked at the relevant files here and I can't find the cause of the exception at the relevant lines.
I am using the NuGet 0.8.17.0 build when I am getting this exception.
Have I failed to follow the documentation correctly?
Thoughts?
What you are doing looks perfect. I suspect this is related to #24
So, I tried out the changes. Bad news. :-( Looks like this is still occurring.
After debugging the code in _Learner.cs_ in the private static LearningModel GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable<object> examples, double trainingPct, int total)
method I found out that when it was attempting to validate the model, I would get a NaN value from the call to _model.Predict(features);_.
// make prediction
var features = descriptor.Convert(o, false).ToVector();
var p = model.Predict(features);
var pred = descriptor.Label.Convert(p);
The NaN value being passed into the Convert method is causing the IndexOutOfRange exception.
In the _StringProperty.cs_ file in the Convert method, the AsEnum is set to true and then it looks for the value in the dictionary and croaks.
if (AsEnum)
return Dictionary[(int)val];
else
return val.ToString();
That is as far as I have time for tonight. I'm going to update my sample code above with what I did to create this issue. I expanded my example slightly.
It might also be the case that the string in question has never been seen by the classifier (this would be a problem).
Issue reopened. Proposed solution is to use a weighted feature hashing / extraction algorithm for string types to resolve this issue.
Hey @bdschrisk I just pulled master and tried it out. Looks like the NeuralNetworkGenerator winds up returning NaN values that can't be converted into an appropriate index entry. I also tried out the PerceptronGenerator and wound up with values that were WAY outside the range of possible values for the Dictionary. Since this is happening in the GenerateModel call stack, we should just trap the exception (or check for failure by extending the StringProperty) and use that value to correctly determine if the model is predicting correctly. If the values returned from the model are so off they cannot be converted back into appropriate labels, isn't that an indication of poor model fit to the data?
Thanks @normanhh3, we plan to add a default value on the Property object to cover this scenario. Without going into detail, yes, it would indicate poor performance, but the current way doesn't really allow unknown values whereas a feature hashing method would allow strings to be instance agnostic.
Sounds like a good solution then.
Shall we close this?
If we implement stratification in the labels at training time we can - that will resolve the issue for the most part. Once feature hashing is added in that will resolve any further issues down the track.
Perhaps an example of what you mean? In the DT for example I have a model default Hint value that represents what the model should select if it gets into a confused state (it just returns the Hint). Should we codify this into the generic IModel/Model class so this issue is resolved across all models? Or are you referring to something else?