machinelearning
machinelearning copied to clipboard
DataFrame merge results in column datatype problem
.Net Core 3.1 Microsoft.Data.Analysis Nuget package version: 0.19.1
The last line of the following program crashes with the exception:
System.ArgumentException: 'Cannot cast column holding System.Double values to type System.Double'
using Microsoft.Data.Analysis; using System; using System.Linq;
namespace TestDataFRame { internal class Program { static void Main(string[] args) { DateTime?[] dates1 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) }; double?[] closePrices = { 10.5, 12.4, 11.3 };
DateTime?[] dates2 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) };
double[] shortPercentages = { 2.34, 2.36, 3.01 };
DataFrame dataFrame1 = new DataFrame();
dataFrame1.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
dataFrame1.Columns.Add(new DoubleDataFrameColumn("ClosePrice", closePrices));
var numbers1 = dataFrame1.Columns.GetDoubleColumn("ClosePrice").ToArray();
DataFrame dataFrame2 = new DataFrame();
dataFrame2.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
dataFrame2.Columns.Add(new DoubleDataFrameColumn("ShortPercentage", shortPercentages));
var numbers2 = dataFrame2.Columns.GetDoubleColumn("ShortPercentage").ToArray();
DataFrame dataFrame = dataFrame1.Merge<DateTime>(dataFrame2, "Date", "Date", joinAlgorithm: JoinAlgorithm.Left);
var numbers = dataFrame.Columns.GetDoubleColumn("ClosePrice").ToArray();
}
}
}
Alright, the issue is that merge is calling "Clone" on the columns, but Clone returns slightly different types. For example, Instead of returning DoubleDataFrameColumn it is returning PrimitiveDataFrameColumn<double>. DoubleDataFrameColumn does extend PrimitiveDataFrameColumn<double> but they aren't the same type obviously. The problem then is that in the call to GetDoubleColumn, the check if (column is DoubleDataFrameColumn ret) fails because its not actually a DoubleDataFrameColumn anymore.
I not sure the exact best way to fix this from a code standpoint. Probably needs a little bit more investigation to figure out the best way. @luisquintanilla for visiblity.