parquet-dotnet
parquet-dotnet copied to clipboard
Is DateTime type supported?
OS: Windows
Expected behavior
write to file
Actual behavior
Unhandled exception. System.InvalidCastException: Unable to cast object of type 'System.Nullable1[System.DateTime][]' to type 'System.Nullable
1[System.DateTimeOffset][]'.
at Parquet.Data.BasicPrimitiveDataTypeHandler1.PackDefinitions(Array data, Int32 maxDefinitionLevel, Int32[]& definitions, Int32& definitionsLength, Int32& nullCount) at Parquet.Data.DataColumn.PackDefinitions(Int32 maxDefinitionLevel, Int32[]& pooledDefinitionLevels, Int32& definitionLevelCount, Int32& nullCount) at Parquet.File.DataColumnWriter.WriteColumn(DataColumn column, SchemaElement tse, IDataTypeHandler dataTypeHandler, Int32 maxRepetitionLevel, Int32 maxDefinitionLevel) at Parquet.File.DataColumnWriter.Write(List
1 path, DataColumn column, IDataTypeHandler dataTypeHandler)
at Parquet.ParquetRowGroupWriter.WriteColumn(DataColumn column)
at testNulls.Program.Main(String[] args) in C:\Work\Parquet\testNulls2\Program.cs:line 43
Code snippet reproducing the behavior
using System;
using Parquet;
using Parquet.Data;
using System.Data;
namespace testdt
{
class Program
{
static void Main(string[] args)
{
var dtArray = new DateTime?[] { };
System.Data.DataTable dt = new System.Data.DataTable();
System.Data.DataColumn column = new System.Data.DataColumn("dtm", System.Type.GetType("System.DateTime"));
dt.Columns.Add(column);
DataRow row = dt.NewRow();
row["dtm"] = DateTime.Today;
dt.Rows.Add(row);
Parquet.Data.DataField[] df = new Parquet.Data.DataField[dt.Columns.Count];
for (int i = 0; i < dt.Columns.Count; i++)
{
switch (System.Type.GetTypeCode(dt.Columns[i].DataType))
{
case TypeCode.DateTime: df[i] = new Parquet.Data.DateTimeDataField(dt.Columns[i].ColumnName, DateTimeFormat.Date, hasNulls: true); break;
}
}
Schema schema = new Schema(df);
using (Stream fileStream = System.IO.File.Create("C:\\Work\\Parquet\\test.parquet"))
{
using (ParquetWriter parquetWriter = new ParquetWriter(schema, fileStream))
{
using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
{
for (int i = 0; i < df.Length; i++)
{
Parquet.Data.DataColumn clmn = null;
switch (System.Type.GetTypeCode(dt.Columns[i].DataType))
{
case TypeCode.DateTime: dtArray = dt.AsEnumerable().Select(d => d.Field<DateTime?>(df[i].Name)).ToArray(); clmn = new Parquet.Data.DataColumn(df[i], dtArray); break;
default: Console.WriteLine("Not found " + df[i].DataType.ToString()); break;
}
groupWriter.WriteColumn(clmn);
}
}
}
}
System.Environment.Exit(0);
}
}
}
Thanks
I have the same issue. It seems Parquet.Net is trying to cast DateTime
to DateTimeOffset
.
Minimal example to reproduce the issue:
using Parquet;
using Parquet.Data;
var dataField = new DataField<DateTime>("D");
var schema = new Schema(dataField);
using var stream = new MemoryStream();
using var parquetWriter = new ParquetWriter(schema, stream);
using var groupWriter = parquetWriter.CreateRowGroup();
groupWriter.WriteColumn(new DataColumn(dataField, new[]{DateTime.Now}));
Output:
Unhandled exception. System.InvalidCastException: Unable to cast object of type 'System.DateTime[]' to type 'System.DateTimeOffset[]'.
at Parquet.Data.ArrayView.GetValuesAndReturnArray[T](DataColumnStatistics statistics, IEqualityComparer`1 equalityComparer, IComparer`1 comparer)+MoveNext()
at Parquet.Data.Concrete.DateTimeOffsetDataTypeHandler.WriteAsInt96(BinaryWriter writer, ArrayView values, DataColumnStatistics dataColumnStatistics)
at Parquet.Data.Concrete.DateTimeOffsetDataTypeHandler.Write(SchemaElement tse, BinaryWriter writer, ArrayView values, DataColumnStatistics statistics)
at Parquet.File.DataColumnWriter.WriteColumn(DataColumn column, SchemaElement tse, IDataTypeHandler dataTypeHandler, Int32 maxRepetitionLevel, Int32 maxDefinitionLevel)
at Parquet.File.DataColumnWriter.Write(List`1 path, DataColumn column, IDataTypeHandler dataTypeHandler)
at Parquet.ParquetRowGroupWriter.WriteColumn(DataColumn column)
at Program.<Main>$(String[] args) in [...]/Program.cs:line 9
Same issue here. If you check out dataField
in the code above I think you'll see that dataField.DataType
is DateTimeOffset
which is surprising to me.
You should use DateTimeOffset
as a workaround or raise a PR to fix this.
DataColumn.DataType property can't be DateTimeOffset. We really need DateTime type.
It really can, try it :)
using System.Data.SqlClient;
namespace testdt
{
class Program
{
static void Main(string[] args)
{
string connectionString ="Data Source=(local);Integrated Security=true";
string queryString = "SELECT GETDATE() AS dt, CAST(GETDATE() AS datetimeoffset) AS dto;";
System.Data.DataTable dt = new System.Data.DataTable();
using (SqlConnection connection = new SqlConnection(connectionString)) {
SqlDataAdapter adapter = new SqlDataAdapter();
adapter.SelectCommand = new SqlCommand(queryString, connection);
adapter.Fill(dt);
}
for (int i = 0; i < dt.Columns.Count; i++)
{
Console.WriteLine((Type.GetTypeCode(dt.Columns[i].DataType)).ToString());
}
}
}
}
Output: DateTime Object