ChoETL icon indicating copy to clipboard operation
ChoETL copied to clipboard

AutoGenerateDuplicateColumnNames not being Respected in CsvReader

Open xtens-digital opened this issue 3 years ago • 5 comments

I think there is a bug somewhere with AutoGenerateDuplicateColumnNames not being respected:

using (var r = ChoCSVReader.LoadText(CSVText) .WithDelimiter("$$") .WithFirstLineHeader() .MayHaveQuotedFields() .AutoIncrementDuplicateColumnNames(0, true) .IgnoreCase(true) .WithEOLDelimiter("$EOL$") ) {
using (var w = new ChoParquetWriter(entryStream)) { w.Write(r); } } This appears to allow data to be read without throwing duplicate column error, however when trying to then do anything with ChoCSVReader it throws an exception which is presumably by creating a new dictionary.

at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key) at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer) at ChoETL.ChoCSVRecordConfiguration.Validate(Object state) at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement) at ChoETL.ChoPeekEnumerator`1.MoveToNext() at ChoETL.ChoPeekEnumerator`1.TryFetchPeek() at ChoETL.ChoPeekEnumerator`1.get_Peek() at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext() at ChoETL.ChoCSVReader`1.<>c__DisplayClass59_0.<GetEnumerator>b__0() at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext() at System.Linq.Enumerable.<OfTypeIterator>d__62`1.MoveNext() at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum) at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext() at ChoETL.ChoUtility.Loop(IEnumerable e, Action preActionCallback, Action`1 postActionCallback) at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)

Any thoughts or ways around this. Yes duplicate columns not ideal, but my understanding is AutoIncrementDuplicateColumnNames should facilitate this issue until we can address root cause?

xtens-digital avatar Nov 21 '22 13:11 xtens-digital

thanks for reporting it, fixed the issue. Pls take v1.2.1.51-beta4 and give it try.

Cinchoo avatar Nov 24 '22 19:11 Cinchoo

Took package 1.0.1.25-beta1 - still seeing issue?

  using (var r = ChoCSVReader.LoadText(Info, new ChoCSVRecordConfiguration { MaxLineSize = _options.ChoReaderMaxLineSize })
            .WithDelimiter(columnDelimeter)
            .WithFirstLineHeader()
            .MayHaveQuotedFields()
            .AutoIncrementDuplicateColumnNames(0, true)
            .IgnoreCase(true)
            .WithEOLDelimiter(endOfLineDelimeter)
            )
            {
                using (var w = new ChoParquetWriter(patientStream.Stream))
                {
                    w.Write(r);
                }
            }

at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key)
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
   at ChoETL.ChoCSVRecordConfiguration.Validate(Object state)
   at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement)
   at ChoETL.ChoPeekEnumerator`1.MoveToNext()
   at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext()
   at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext()
   at System.Linq.Enumerable.<OfTypeIterator>d__61`1.MoveNext()
   at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum)
   at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext()
   at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)```

xtens-digital avatar Dec 16 '22 09:12 xtens-digital

u must take v1.2.1.51-beta4 for this issue to work.

Cinchoo avatar Dec 18 '22 14:12 Cinchoo

I did not see this version published! 1.0.1.25-beta1 was the latest. Additionally in this version all parquet file cell values became null.

xtens-digital avatar Dec 20 '22 09:12 xtens-digital

well, you need to update the base lib to 1.2.1.51-beta4 at https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta4. (csv parser is in this lib)

you can pick latest parquet lib at https://www.nuget.org/packages/ChoETL.Parquet

Cinchoo avatar Dec 20 '22 12:12 Cinchoo