AutoGenerateDuplicateColumnNames not being Respected in CsvReader
I think there is a bug somewhere with AutoGenerateDuplicateColumnNames not being respected:
using (var r = ChoCSVReader.LoadText(CSVText)
.WithDelimiter("$$")
.WithFirstLineHeader()
.MayHaveQuotedFields()
.AutoIncrementDuplicateColumnNames(0, true)
.IgnoreCase(true)
.WithEOLDelimiter("$EOL$")
)
{
using (var w = new ChoParquetWriter(entryStream))
{
w.Write(r);
}
}
This appears to allow data to be read without throwing duplicate column error, however when trying to then do anything with ChoCSVReader it throws an exception which is presumably by creating a new dictionary.
at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key) at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer) at ChoETL.ChoCSVRecordConfiguration.Validate(Object state) at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement) at ChoETL.ChoPeekEnumerator`1.MoveToNext() at ChoETL.ChoPeekEnumerator`1.TryFetchPeek() at ChoETL.ChoPeekEnumerator`1.get_Peek() at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext() at ChoETL.ChoCSVReader`1.<>c__DisplayClass59_0.<GetEnumerator>b__0() at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext() at System.Linq.Enumerable.<OfTypeIterator>d__62`1.MoveNext() at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum) at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext() at ChoETL.ChoUtility.Loop(IEnumerable e, Action preActionCallback, Action`1 postActionCallback) at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)
Any thoughts or ways around this. Yes duplicate columns not ideal, but my understanding is AutoIncrementDuplicateColumnNames should facilitate this issue until we can address root cause?
thanks for reporting it, fixed the issue. Pls take v1.2.1.51-beta4 and give it try.
Took package 1.0.1.25-beta1 - still seeing issue?
using (var r = ChoCSVReader.LoadText(Info, new ChoCSVRecordConfiguration { MaxLineSize = _options.ChoReaderMaxLineSize })
.WithDelimiter(columnDelimeter)
.WithFirstLineHeader()
.MayHaveQuotedFields()
.AutoIncrementDuplicateColumnNames(0, true)
.IgnoreCase(true)
.WithEOLDelimiter(endOfLineDelimeter)
)
{
using (var w = new ChoParquetWriter(patientStream.Stream))
{
w.Write(r);
}
}
at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key)
at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
at ChoETL.ChoCSVRecordConfiguration.Validate(Object state)
at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement)
at ChoETL.ChoPeekEnumerator`1.MoveToNext()
at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext()
at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext()
at System.Linq.Enumerable.<OfTypeIterator>d__61`1.MoveNext()
at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum)
at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext()
at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)```
u must take v1.2.1.51-beta4 for this issue to work.
I did not see this version published! 1.0.1.25-beta1 was the latest. Additionally in this version all parquet file cell values became null.
well, you need to update the base lib to 1.2.1.51-beta4 at https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta4. (csv parser is in this lib)
you can pick latest parquet lib at https://www.nuget.org/packages/ChoETL.Parquet