parquet-dotnet icon indicating copy to clipboard operation
parquet-dotnet copied to clipboard

Is DateTime type supported?

Open dburtsev opened this issue 2 years ago • 2 comments

OS: Windows

Expected behavior

write to file

Actual behavior

Unhandled exception. System.InvalidCastException: Unable to cast object of type 'System.Nullable1[System.DateTime][]' to type 'System.Nullable1[System.DateTimeOffset][]'. at Parquet.Data.BasicPrimitiveDataTypeHandler1.PackDefinitions(Array data, Int32 maxDefinitionLevel, Int32[]& definitions, Int32& definitionsLength, Int32& nullCount) at Parquet.Data.DataColumn.PackDefinitions(Int32 maxDefinitionLevel, Int32[]& pooledDefinitionLevels, Int32& definitionLevelCount, Int32& nullCount) at Parquet.File.DataColumnWriter.WriteColumn(DataColumn column, SchemaElement tse, IDataTypeHandler dataTypeHandler, Int32 maxRepetitionLevel, Int32 maxDefinitionLevel) at Parquet.File.DataColumnWriter.Write(List1 path, DataColumn column, IDataTypeHandler dataTypeHandler) at Parquet.ParquetRowGroupWriter.WriteColumn(DataColumn column) at testNulls.Program.Main(String[] args) in C:\Work\Parquet\testNulls2\Program.cs:line 43

Code snippet reproducing the behavior

using System;
using Parquet;
using Parquet.Data;
using System.Data;

namespace testdt
{
    class Program
    {
        static void Main(string[] args)
        {
            var dtArray = new DateTime?[] { };
            System.Data.DataTable dt = new System.Data.DataTable();
            System.Data.DataColumn column = new System.Data.DataColumn("dtm", System.Type.GetType("System.DateTime"));
            dt.Columns.Add(column);
            DataRow row = dt.NewRow();
            row["dtm"] = DateTime.Today;
            dt.Rows.Add(row);

            Parquet.Data.DataField[] df = new Parquet.Data.DataField[dt.Columns.Count];
            for (int i = 0; i < dt.Columns.Count; i++)
            {
                switch (System.Type.GetTypeCode(dt.Columns[i].DataType))
                {
                    case TypeCode.DateTime: df[i] = new Parquet.Data.DateTimeDataField(dt.Columns[i].ColumnName, DateTimeFormat.Date, hasNulls: true); break;
                }
            }
            Schema schema = new Schema(df);
            using (Stream fileStream = System.IO.File.Create("C:\\Work\\Parquet\\test.parquet"))
            {
                using (ParquetWriter parquetWriter = new ParquetWriter(schema, fileStream))
                {
                   using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
                   {
                       for (int i = 0; i < df.Length; i++)
                       {
                           Parquet.Data.DataColumn clmn = null;
                           switch (System.Type.GetTypeCode(dt.Columns[i].DataType))
                           {
                               case TypeCode.DateTime: dtArray = dt.AsEnumerable().Select(d => d.Field<DateTime?>(df[i].Name)).ToArray(); clmn = new Parquet.Data.DataColumn(df[i], dtArray); break;
                               default: Console.WriteLine("Not found " + df[i].DataType.ToString()); break;
                           }
                           groupWriter.WriteColumn(clmn);
                       }
                   }
                }
            }
            System.Environment.Exit(0);
        }
    }
}

Thanks

dburtsev avatar Feb 28 '22 20:02 dburtsev

I have the same issue. It seems Parquet.Net is trying to cast DateTime to DateTimeOffset.

Minimal example to reproduce the issue:

using Parquet;
using Parquet.Data;

var dataField = new DataField<DateTime>("D");
var schema = new Schema(dataField);
using var stream = new MemoryStream();
using var parquetWriter = new ParquetWriter(schema, stream);
using var groupWriter = parquetWriter.CreateRowGroup();
groupWriter.WriteColumn(new DataColumn(dataField, new[]{DateTime.Now}));

Output:

Unhandled exception. System.InvalidCastException: Unable to cast object of type 'System.DateTime[]' to type 'System.DateTimeOffset[]'.
   at Parquet.Data.ArrayView.GetValuesAndReturnArray[T](DataColumnStatistics statistics, IEqualityComparer`1 equalityComparer, IComparer`1 comparer)+MoveNext()
   at Parquet.Data.Concrete.DateTimeOffsetDataTypeHandler.WriteAsInt96(BinaryWriter writer, ArrayView values, DataColumnStatistics dataColumnStatistics)
   at Parquet.Data.Concrete.DateTimeOffsetDataTypeHandler.Write(SchemaElement tse, BinaryWriter writer, ArrayView values, DataColumnStatistics statistics)
   at Parquet.File.DataColumnWriter.WriteColumn(DataColumn column, SchemaElement tse, IDataTypeHandler dataTypeHandler, Int32 maxRepetitionLevel, Int32 maxDefinitionLevel)
   at Parquet.File.DataColumnWriter.Write(List`1 path, DataColumn column, IDataTypeHandler dataTypeHandler)
   at Parquet.ParquetRowGroupWriter.WriteColumn(DataColumn column)
   at Program.<Main>$(String[] args) in [...]/Program.cs:line 9

anthiras avatar Jul 15 '22 12:07 anthiras

Same issue here. If you check out dataField in the code above I think you'll see that dataField.DataType is DateTimeOffset which is surprising to me.

amueller avatar Sep 02 '22 16:09 amueller

You should use DateTimeOffset as a workaround or raise a PR to fix this.

aloneguid avatar Dec 01 '22 12:12 aloneguid

DataColumn.DataType property can't be DateTimeOffset. We really need DateTime type.

dburtsev avatar Dec 01 '22 14:12 dburtsev

It really can, try it :)

aloneguid avatar Dec 01 '22 16:12 aloneguid

using System.Data.SqlClient;
namespace testdt
{
    class Program
    {
        static void Main(string[] args)
        {
            string connectionString ="Data Source=(local);Integrated Security=true";
            string queryString = "SELECT GETDATE() AS dt, CAST(GETDATE() AS datetimeoffset) AS dto;";
            System.Data.DataTable dt = new System.Data.DataTable();            
            using (SqlConnection connection = new SqlConnection(connectionString)) {
                SqlDataAdapter adapter = new SqlDataAdapter();
                adapter.SelectCommand = new SqlCommand(queryString, connection);
                adapter.Fill(dt);
            }
            for (int i = 0; i < dt.Columns.Count; i++)
            {
                Console.WriteLine((Type.GetTypeCode(dt.Columns[i].DataType)).ToString());
            }
        }   
    }
}

Output: DateTime Object

dburtsev avatar Dec 01 '22 20:12 dburtsev