numl
numl copied to clipboard
Linear Regression with NuML
I'm trying to do a really basic linear regression (Z = 2 * X + 1) prediction using NuML. Given the data is so linear I can't understand why the predicted value is so far off unless I am doing something wrong. I have the target class
public class Sample { public float V { get; set; } public float X { get; set; } public float Y { get; set; } public float Z { get; set; }
public Func<float, float, float, float> OutputStrategy { get; set; }
public Sample(Func<float, float, float, float> outputStrategy)
{
OutputStrategy = outputStrategy;
}
public void Seed(int i)
{
V = (float) i;
X = (float) 2 * i;
Y = (float) 3 * i;
Z = OutputStrategy(V, X, Y);
}
}
and I have the NuML code to set up the source values and predict an answer for an arbitrary new data point:
NB: The output strategy is a simple 2 * A + 1. I've tried it with multivariate analysis and the prediction is further away
public static void Main(string[] args) { // Generate sample data int sampleSize = 1000; Sample[] samples = new Sample[sampleSize]; Func<float, float, float, float> outputStrategy = (A, B, C) => 2 * A + 1; for (int i = 0; i < sampleSize; i++) { samples[i] = new Sample(outputStrategy); samples[i].Seed(i); }
// calculate model
var generator = new LinearRegressionGenerator();
var descriptor = Descriptor.New("Samples")
.With("V").As(typeof(float))
.With("X").As(typeof(float))
.With("Y").As(typeof(float))
.Learn("Z").As(typeof(float));
generator.Descriptor = descriptor;
var model = Learner.Learn(samples, 0.6, 50, generator);
// Use prediction
var targetSample = new Sample(outputStrategy);
targetSample.Seed(sampleSize + 1);
var predictedSample = model.Model.Predict(targetSample);
var predictedValue = predictedSample.Z;
var actualValue = outputStrategy(targetSample.V, targetSample.X, targetSample.Y);
Console.Write("Predicted Value = {0}, Actual Value = {1}, Difference = {2} {3:0.00}%", predictedValue, actualValue, actualValue - predictedValue, (decimal) (actualValue - predictedValue) / (decimal) predictedValue * 100M);
Console.ReadKey();
}
This gives a difference of about 0.5% which considering the line is completely straight was surprising. I have tried using different % of the dataset for training and number of iterations of the model but it makes no difference to the output.
If I use even a more slightly more complicated model I get much worse predictive capabilities. If I use logistic regression, the predicted output of Z is always 1?!