numl icon indicating copy to clipboard operation
numl copied to clipboard

Linear Regression with NuML

Open johnstaveley opened this issue 6 years ago • 0 comments

I'm trying to do a really basic linear regression (Z = 2 * X + 1) prediction using NuML. Given the data is so linear I can't understand why the predicted value is so far off unless I am doing something wrong. I have the target class

public class Sample { public float V { get; set; } public float X { get; set; } public float Y { get; set; } public float Z { get; set; }

    public Func<float, float, float, float> OutputStrategy { get; set; }
    public Sample(Func<float, float, float, float> outputStrategy)
    {
        OutputStrategy = outputStrategy;
    }
    public void Seed(int i)
    {
        V = (float) i;
        X = (float) 2 * i;
        Y = (float) 3 * i;
        Z = OutputStrategy(V, X, Y);
    }
}

and I have the NuML code to set up the source values and predict an answer for an arbitrary new data point:

NB: The output strategy is a simple 2 * A + 1. I've tried it with multivariate analysis and the prediction is further away

public static void Main(string[] args) { // Generate sample data int sampleSize = 1000; Sample[] samples = new Sample[sampleSize]; Func<float, float, float, float> outputStrategy = (A, B, C) => 2 * A + 1; for (int i = 0; i < sampleSize; i++) { samples[i] = new Sample(outputStrategy); samples[i].Seed(i); }

    // calculate model
    var generator = new LinearRegressionGenerator();
    var descriptor = Descriptor.New("Samples")
        .With("V").As(typeof(float))
        .With("X").As(typeof(float))
        .With("Y").As(typeof(float))
        .Learn("Z").As(typeof(float));
    generator.Descriptor = descriptor;
    var model = Learner.Learn(samples, 0.6, 50, generator);

    // Use prediction
    var targetSample = new Sample(outputStrategy);
    targetSample.Seed(sampleSize + 1);
    var predictedSample = model.Model.Predict(targetSample);
    var predictedValue = predictedSample.Z;
    var actualValue = outputStrategy(targetSample.V, targetSample.X, targetSample.Y);
    Console.Write("Predicted Value = {0}, Actual Value = {1}, Difference = {2} {3:0.00}%", predictedValue, actualValue, actualValue - predictedValue, (decimal) (actualValue - predictedValue) / (decimal) predictedValue * 100M);
    Console.ReadKey();
}

This gives a difference of about 0.5% which considering the line is completely straight was surprising. I have tried using different % of the dataset for training and number of iterations of the model but it makes no difference to the output.

If I use even a more slightly more complicated model I get much worse predictive capabilities. If I use logistic regression, the predicted output of Z is always 1?!

johnstaveley avatar Mar 26 '18 13:03 johnstaveley