machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

DataFrame merge results in column datatype problem

Open olavt opened this issue 3 years ago • 1 comments

.Net Core 3.1 Microsoft.Data.Analysis Nuget package version: 0.19.1

The last line of the following program crashes with the exception:

System.ArgumentException: 'Cannot cast column holding System.Double values to type System.Double'

using Microsoft.Data.Analysis; using System; using System.Linq;

namespace TestDataFRame { internal class Program { static void Main(string[] args) { DateTime?[] dates1 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) }; double?[] closePrices = { 10.5, 12.4, 11.3 };

        DateTime?[] dates2 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) };
        double[] shortPercentages = { 2.34, 2.36, 3.01 };

        DataFrame dataFrame1 = new DataFrame();
        dataFrame1.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
        dataFrame1.Columns.Add(new DoubleDataFrameColumn("ClosePrice", closePrices));

        var numbers1 = dataFrame1.Columns.GetDoubleColumn("ClosePrice").ToArray();

        DataFrame dataFrame2 = new DataFrame();
        dataFrame2.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
        dataFrame2.Columns.Add(new DoubleDataFrameColumn("ShortPercentage", shortPercentages));

        var numbers2 = dataFrame2.Columns.GetDoubleColumn("ShortPercentage").ToArray();

        DataFrame dataFrame = dataFrame1.Merge<DateTime>(dataFrame2, "Date", "Date", joinAlgorithm: JoinAlgorithm.Left);
        var numbers = dataFrame.Columns.GetDoubleColumn("ClosePrice").ToArray();
    }
}

}

olavt avatar Mar 14 '22 15:03 olavt

Alright, the issue is that merge is calling "Clone" on the columns, but Clone returns slightly different types. For example, Instead of returning DoubleDataFrameColumn it is returning PrimitiveDataFrameColumn<double>. DoubleDataFrameColumn does extend PrimitiveDataFrameColumn<double> but they aren't the same type obviously. The problem then is that in the call to GetDoubleColumn, the check if (column is DoubleDataFrameColumn ret) fails because its not actually a DoubleDataFrameColumn anymore.

I not sure the exact best way to fix this from a code standpoint. Probably needs a little bit more investigation to figure out the best way. @luisquintanilla for visiblity.

michaelgsharp avatar Mar 18 '22 19:03 michaelgsharp