ChoETL
ChoETL copied to clipboard
ChoParquetWriter Error with IDataReader and DBNull
Issue when you use ChoParquetWriter and and IDataReader to a table that has two columns.
SQL Server 2019 Database
Table has two columns.
First column is an int and not null
Second column is a date time and allow nulls.
Sample:
1, null
2, null
3, null
I ran into two issues: First this fails because it doesn't understand DBNull for data type DateTime but also it does not fail on the first row, it fails on the last row and the output parquet file is blank. I assume it will have a problem with any field/data type that allows nulls. The IDataReader changes from database NULL to data type DBNull.
Error captured:
An application error occurred. Please contact the administrator with the following information:
- Exception Information
Exception Type: Microsoft.CSharp.RuntimeBinder.RuntimeBinderException TargetSite: System.DateTime CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, System.Object) Message: Cannot implicitly convert type 'System.DBNull' to 'System.DateTime' Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: Anonymously Hosted DynamicMethods Assembly HResult: -2146233088
StackTrace Information
at CallSite.Target(Closure , CallSite , Object )
at ChoETL.ChoParquetRecordWriter.GetFieldValues(String key, Type ft)
at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw)
at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw)
at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter
1.Dispose()
code:
using(var cmd = new SqlCommand("select Id, DateTimeTs from TableName (nolock) order by 1", conn)) {
conn.Open();
cmd.CommandTimeout = 300; //commmand timeout causes a 258 exception
using (var reader = cmd.ExecuteReader())
{
using (var parser = new ChoParquetWriter(outputFile)
.Configure(c => c.CompressionMethod = compressionMethod)
.Configure(c => c.LiteParsing = true)
.Configure(c => c.RowGroupSize = rowGroupSize)
.NotifyAfter(1000)
.OnRowsWritten((o,e) => $"Rows: {e.RowsWritten} <--- {DateTime.Now}".Print()))
{
if (reader.HasRows) {
parser.Write(reader);
}
}
}
}
Can u pls tell me the version of ChoETL.Parquet lib you are using?
1.0.1.22
can u take the latest version and give it try. Let me know.
ok, it worked with two columns one id required int and second was a datetime that allowed null. That worked but my table has 20+ columns half of which are null of various data types. I reran the test with a select * and it fails with the following: It seems the if will fail if any column has NULL value. tested with version 1.0.1.24.
An application error occurred. Please contact the administrator with the following information:
- Exception Information
Exception Type: System.InvalidCastException TargetSite: Void CopySlow(System.Array, Int32, System.Array, Int32, Int32) Message: At least one element in the source array could not be cast down to the destination array type. Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: System.Private.CoreLib HResult: -2147467262
StackTrace Information
at System.Array.CopySlow(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length)
at System.Array.Copy(Array sourceArray, Int32 sourceIndex, Array destinationArray, Int32 destinationIndex, Int32 length, Boolean reliable)
at System.Array.Copy(Array sourceArray, Array destinationArray, Int32 length)
at ChoETL.ChoUtility.Cast(Array array, Type elementType)
at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw)
at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw)
at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter
1.Dispose()
at Program.TestParquet()
at Program.Main(String[] args)
Do you need more information on this issue I can provide?
The field that is causing the is a bit data type. It looks like any field/data type with null value gets this error message.
pls take https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.25-beta2 and give it try.
Let me know.
Thanks. I will try it in a few days.
Ok I tried 1.0.1.25-beta2 and it failed in the same way as before. I tried 1.0.1.25-beta1 and it worked for the subset of columns but when I select all columns in my table it fails. I will pinpoint column by column next. At least the bit field seem to work with beta1.
pls download both packages, try it. Let me know.
https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta5 https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.25-beta2
I removed previous packages and added both of these packages. I still get the same error message: It is failing on a string column that has values in the first 2 rows but null on the 3rd row. I am only processing 3 rows of data. The column data type is nvarchar(max).
An application error occurred. Please contact the administrator with the following information:
- Exception Information
Exception Type: System.InvalidCastException TargetSite: Void InternalSetValue(System.Object, IntPtr) Message: Object cannot be stored in an array of this type. Data: System.Collections.ListDictionaryInternal HelpLink: NULL Source: System.Private.CoreLib HResult: -2147467262
StackTrace Information
at System.Array.InternalSetValue(Object value, IntPtr flattenedIndex)
at System.Array.SetValue(Object value, Int32 index)
at System.Array.System.Collections.IList.set_Item(Int32 index, Object value)
at ChoETL.ChoUtility.Cast(Array array, Type elementType)
at ChoETL.ChoParquetRecordWriter.WriteAllRecords(StreamWriter sw)
at ChoETL.ChoParquetRecordWriter.Dispose(StreamWriter sw)
at ChoETL.ChoParquetWriter1.Dispose(Boolean finalize) at ChoETL.ChoParquetWriter
1.Dispose()
at Program.TestParquet() in /home/user/Program.cs:line 66
at Program.Main(String[] args) in /home/user/Program.cs:line 18
I changed the .Configure(c => c.LiteParsing = false) and it worked without an error. If I set to true it fails. Interesting.