CsvHelper icon indicating copy to clipboard operation
CsvHelper copied to clipboard

Allow `GetRecords` to skip invalid rows/records and report on those during enumeration

Open IanKemp opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. For some context, see #803.

If you want to read a CSV file that may have invalid CSV, without throwing exceptions when a bad line is encountered, you currently have to construct a CsvConfiguration and populate all of the *Exception properties with a delegate that flags the specific line as problematic. Then you need to construct a while loop to iterate over every record in the file, and inside that loop add a conditional to the aforementioned flag and if so, avoid trying to read it. Then you need to reset the flag and in the other branch of the conditional (i.e. record read successfully), add it to a collection, or yield return it, or whatever.

Yikes, that's a lot of code for something that IMO should be simple.

Describe the solution you'd like Allow the CsvReader.GetRecords methods to accept an optional delegate that receives a CsvContext object and returns a bool. If supplied, said delegate is invoked when an invalid CSV row of any type is encountered. If the delegate returns true, the reader will behave as currently, i.e. bubble up relevant exceptions and abort reading. But if the delegate returns false, the reader will simply ignore that row and continue reading to the end of the file.

Signature:

public class CsvReader
{
    public delegate bool HandleGetRecordError(CsvContext context);

    public virtual IEnumerable<object> GetRecords(Type type, HandleGetRecordError handler)
    {
        ... implementation elided...
    }
}

This would allow for code like the following:

string invalidCsv = ...

using (var reader = new StringReader(invalidCsv));
using (var csv = new CsvReader(reader))
{
    // behaves exactly as GetRecords does currently
    return csv.GetRecords(recordType, context => true);
    
    // does NOT throw if an invalid record is encountered
    return csv.GetRecords(recordType, context => false);

    // allows for some very useful error tracking, e.g. which lines were bad
    var errorLineNumbers = new List<int>();

    return csv.GetRecords(recordType, context =>
    {
        errorLineNumbers.Add(context.Parser.RawRow);

        return false;
    });
}

IanKemp avatar Jul 19 '21 09:07 IanKemp

I think ReadingExceptionOccurred does what you're looking for. It gives you the CsvHelperException that occurred, which contains the context. You return true for the given exception to be thrown, or false to continue reading, ignoring that exception.

JoshClose avatar Jul 19 '21 21:07 JoshClose