ChoETL icon indicating copy to clipboard operation
ChoETL copied to clipboard

ChoParquetWriter Error with IDataReader and DBNull

Open relcodedev opened this issue 2 years ago • 12 comments

Issue when you use ChoParquetWriter and and IDataReader to a table that has two columns.
SQL Server 2019 Database Table has two columns. First column is an int and not null Second column is a date time and allow nulls. Sample: 1, null 2, null 3, null

I ran into two issues: First this fails because it doesn't understand DBNull for data type DateTime but also it does not fail on the first row, it fails on the last row and the output parquet file is blank. I assume it will have a problem with any field/data type that allows nulls. The IDataReader changes from database NULL to data type DBNull.

Error captured:

An application error occurred. Please contact the administrator with the following information:

  1. Exception Information

Exception Type: Microsoft.CSharp.RuntimeBinder.RuntimeBinderException TargetSite: System.DateTime CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, System.Object) Message: Cannot implicitly convert type 'System.DBNull' to 'System.DateTime' Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: Anonymously Hosted DynamicMethods Assembly HResult: -2146233088

StackTrace Information


at CallSite.Target(Closure , CallSite , Object ) at ChoETL.ChoParquetRecordWriter.GetFieldValues(String key, Type ft) at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw) at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw) at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter1.Dispose()

code:

using(var cmd = new SqlCommand("select Id, DateTimeTs from TableName (nolock) order by 1", conn)) {

        conn.Open();
        
        cmd.CommandTimeout = 300;  //commmand timeout causes a 258 exception
        
        using (var reader = cmd.ExecuteReader())
        {
            
            using (var parser = new ChoParquetWriter(outputFile)
            .Configure(c => c.CompressionMethod = compressionMethod)
            .Configure(c => c.LiteParsing = true)
            .Configure(c => c.RowGroupSize = rowGroupSize)
            .NotifyAfter(1000)
            .OnRowsWritten((o,e) => $"Rows: {e.RowsWritten} <--- {DateTime.Now}".Print()))
            {
                if (reader.HasRows) {
                    parser.Write(reader);
                }
            }
        }

    }

relcodedev avatar Sep 23 '22 13:09 relcodedev

Can u pls tell me the version of ChoETL.Parquet lib you are using?

Cinchoo avatar Sep 24 '22 13:09 Cinchoo

1.0.1.22

relcodedev avatar Sep 27 '22 13:09 relcodedev

can u take the latest version and give it try. Let me know.

Cinchoo avatar Nov 04 '22 13:11 Cinchoo

ok, it worked with two columns one id required int and second was a datetime that allowed null. That worked but my table has 20+ columns half of which are null of various data types. I reran the test with a select * and it fails with the following: It seems the if will fail if any column has NULL value. tested with version 1.0.1.24.

An application error occurred. Please contact the administrator with the following information:

  1. Exception Information

Exception Type: System.InvalidCastException TargetSite: Void CopySlow(System.Array, Int32, System.Array, Int32, Int32) Message: At least one element in the source array could not be cast down to the destination array type. Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: System.Private.CoreLib HResult: -2147467262

StackTrace Information


at System.Array.CopySlow(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length) at System.Array.Copy(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length, Boolean reliable) at System.Array.Copy(Array sourceArray, Array destinationArray, Int32 length) at ChoETL.ChoUtility.Cast(Array array, Type elementType) at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw) at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw) at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter1.Dispose() at Program.TestParquet() at Program.Main(String[] args)

relcodedev avatar Nov 04 '22 18:11 relcodedev

Do you need more information on this issue I can provide?

relcodedev avatar Dec 15 '22 14:12 relcodedev

The field that is causing the is a bit data type. It looks like any field/data type with null value gets this error message.

relcodedev avatar Dec 15 '22 15:12 relcodedev

pls take https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.25-beta2 and give it try.

Let me know.

Cinchoo avatar Dec 21 '22 20:12 Cinchoo

Thanks. I will try it in a few days.

relcodedev avatar Dec 22 '22 01:12 relcodedev

Ok I tried 1.0.1.25-beta2 and it failed in the same way as before. I tried 1.0.1.25-beta1 and it worked for the subset of columns but when I select all columns in my table it fails. I will pinpoint column by column next. At least the bit field seem to work with beta1.

relcodedev avatar Dec 23 '22 22:12 relcodedev

pls download both packages, try it. Let me know.

https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta5 https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.25-beta2

Cinchoo avatar Dec 24 '22 15:12 Cinchoo

I removed previous packages and added both of these packages. I still get the same error message: It is failing on a string column that has values in the first 2 rows but null on the 3rd row. I am only processing 3 rows of data. The column data type is nvarchar(max).

An application error occurred. Please contact the administrator with the following information:

  1. Exception Information

Exception Type: System.InvalidCastException TargetSite: Void InternalSetValue(System.Object, IntPtr) Message: Object cannot be stored in an array of this type. Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: System.Private.CoreLib HResult: -2147467262

StackTrace Information


at System.Array.InternalSetValue(Object value, IntPtr flattenedIndex) at System.Array.SetValue(Object value, Int32 index) at System.Array.System.Collections.IList.set_Item(Int32 index, Object value) at ChoETL.ChoUtility.Cast(Array array, Type elementType) at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw) at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw) at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter1.Dispose() at Program.TestParquet() in /home/user/Program.cs:line 66 at Program.Main(String[] args) in /home/user/Program.cs:line 18

relcodedev avatar Jan 03 '23 13:01 relcodedev

I changed the .Configure(c => c.LiteParsing = false) and it worked without an error. If I set to true it fails. Interesting.

relcodedev avatar Jan 03 '23 20:01 relcodedev