CsvReader icon indicating copy to clipboard operation
CsvReader copied to clipboard

Why is CsvReader forward only?

Open Thierry-S opened this issue 8 years ago • 6 comments

More of a question, initially thought it was a bug.

I need to know how many lines are in my csv file. So I did a myCsvReader.Count() (Linq) to get it. Of course, this moves the cursor forward to the end as it has to iterate it. But then I called myCsvReader.GetEnumerator().Reset() or even myCsvReader.MoveTo(-1) and they have no effect.

I found this line of code in MoveTo: if (record < _currentRecordIndex) return false; which confirm my thoughts that this is a forward only reader.

This msdn page (https://msdn.microsoft.com/en-us/library/65zzykke(v=vs.100).aspx) says that

Iterators do not support the IEnumerator.Reset method. To re-iterate from the beginning, you must obtain a new iterator.

But I've tried to do var en2 = myCsvReader.GetEnumerator(); and that doesn't return a new iterator. It returns the existing iterator which points to the end.

Why? Why can't I go back to the begining?

Thierry-S avatar Dec 08 '16 16:12 Thierry-S

I didn't do the original design but I would guess its about two things

  1. Parsing forwards is relatively easy, backwards less so,
  2. Keeping all the data on the of chance you need it is expensive (think files in the 1 GB+ range)

Also,, most typical processing for CSV files is...

  1. Read a row
  2. Process a row
  3. Next

Bear in mind there is a class included called CachedCsvReader which does allow you to go backwards, but you have to take the memory hit

phatcher avatar Dec 08 '16 18:12 phatcher

Just to give a real-world scenario for this. For my use I dont need to process a CSV line by line as all Im doing is loading a CSV file into a SQLserver table using SQLBulkCopy.

For large CSV files with millions of rows I really need the rowcount in advance so I can update a progressbar as the upload progresses. For larger files an upload might take 30 mins so having a progressbar is really essential IMO. But because its a large file I would also like to minimize the memory footprint so prefer not to use CachedCSVReader if possible.

So its frustrating that the basic CSVReader which would have otherwise been ideal doesnt have the record count.

hugobyrne avatar Jun 03 '17 20:06 hugobyrne

@hugobyrne Same reasons as before, parsing is (potentially) tough, but you could do a couple of things.

  1. Do a quick first pass that counts "\r\n" which would give you a row count if the data is not complex
  2. Re-work this idea as part of CsvReader so we can have it as an option e.g. NaiveCount or EstimatedCount

phatcher avatar Jun 04 '17 14:06 phatcher

Slightly related point, have you looked at this http://sqlblog.com/blogs/alberto_ferrari/archive/2009/11/30/sqlbulkcopy-performance-analysis.aspx, relatively old, but gives a lot of performance analysis/ideas on SqlBulkCopy

phatcher avatar Jun 04 '17 14:06 phatcher

Thankyou very much Paul for the fast reply. III see what I can work out based on your advice. And that sqlbulkcopy performance analysis is indeed very useful. There is some excellent tips in there which III be following up on.

hugobyrne avatar Jun 05 '17 09:06 hugobyrne

Just use a second reader instance to go through the file a second time.

BobAmmerman avatar Feb 23 '18 16:02 BobAmmerman