CsvHelper icon indicating copy to clipboard operation
CsvHelper copied to clipboard

LineBreakInQuotedFieldIsBadData Not Working since 28.0.0

Open soligen2010 opened this issue 2 years ago • 5 comments

I am trying to upgrade from version 27.2.1 to the latest and am running into an issue that was introduced in 28.0.0. I think it may be a bug.

I am reading using reader.GetRecordsAsync

My configuration is

        var config = new CsvConfiguration(CultureInfo.InvariantCulture)

        {

            Delimiter = ",",

            HasHeaderRecord = true,

            TrimOptions = TrimOptions.Trim,

            MissingFieldFound = ConfigurationFunctions.MissingFieldFound,

            BadDataFound = ConfigurationFunctions.BadDataFound,

            LineBreakInQuotedFieldIsBadData = true,

        };

The case is if a record is cut off in the middle of a quotes field, such as:

Field1, Field2, Field3, Field4, Field5

1,2,3, “Text 1”,”Text

1,2,3, “Text 3”,”Text 4”

In 27.2.1, this gets a BadDataException.

In 28.0.0, no exception is thrown. On the first record Field5 gets set to mull, then on the second record all fields are set to null (all that data is lost)

I would like an exception thrown instead of succeeding with lost data. I can’t simply check for nulls values as that is a valid use case.

Can this be fixed back to 27.2.1 behavior? Or is there another option for detecting this?

Thanks

soligen2010 avatar Jan 05 '23 21:01 soligen2010

In your example, those characters aren't quotes. Looks like left and right quotes or something.

If I change them to quotes it throws a BadDataException for me.

void Main()
{
	var s = new StringBuilder();
	s.Append("Field1, Field2, Field3, Field4, Field5\r\n");
	s.Append("1,2,3, \"Text 1\",\"Text\r\n");
	s.Append("1,2,3, \"Text 3\",\"Text 4\"\r\n");
	var config = new CsvConfiguration(CultureInfo.InvariantCulture)
	{
		TrimOptions = TrimOptions.Trim,
	};
	using (var reader = new StringReader(s.ToString()))
	using (var csv = new CsvParser(reader, config))
	{
		while (csv.Read())
		{
			csv.Record.Dump();
		}
	}
}

JoshClose avatar Jan 05 '23 22:01 JoshClose

Thanks for the quick reply. Sorry for the issue with the quotes, but that is not the issue. It must have gotten mangled as I moved the example to the machine I could use to post here.

Here is a Console app version of a complete example that illustrates the issue. I am using CsvReader, whereas you used CsvParser. The only differences from my actual code are I substituted a MemoryStream, and save the records to a List instead of doing a yield return.

using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
using System.Text;

public class Program
{
    static async Task Main()
    {
        Console.WriteLine("Starting");

        var s = new StringBuilder();
        s.Append("Field1, Field2, Field3, Field4, Field5\r\n");
        s.Append("1,2,3, \"Text 1\",\"Text\r\n");
        s.Append("1,2,3, \"Text 3\",\"Text 4\"\r\n");

        var config = new CsvConfiguration(CultureInfo.InvariantCulture)
        {
            Delimiter = ",",
            HasHeaderRecord = true,
            TrimOptions = TrimOptions.Trim,
            MissingFieldFound = ConfigurationFunctions.MissingFieldFound,
            BadDataFound = ConfigurationFunctions.BadDataFound,
            LineBreakInQuotedFieldIsBadData = true,
        };

        using var dataStream = new MemoryStream(Encoding.UTF8.GetBytes(s.ToString()));
        using var textReader = (TextReader)new StreamReader(dataStream, new UTF8Encoding());
        using var reader = new CsvReader(textReader, config);
        reader.Context.TypeConverterOptionsCache.GetOptions<string>().NullValues.Add("NULL");
        reader.Context.TypeConverterOptionsCache.GetOptions<string>().NullValues.Add("null");
        reader.Context.TypeConverterOptionsCache.GetOptions<string>().NullValues.Add("Null");
        reader.Context.TypeConverterOptionsCache.GetOptions<string>().NullValues.Add(string.Empty);

        var records = new List<IDictionary<string, object>>();
        try
        {
            await foreach (var record in reader.GetRecordsAsync<dynamic>().ConfigureAwait(false))
            {
                records.Add((IDictionary<string, object>)record);
            }
        }
        catch
        {
            Console.WriteLine("Exception thrown in version 27.2.1.  This is the behavior I expect");
            throw;
        }

        Console.WriteLine("Exception not thrown in version 28.0.0.  I would expect this would throw an exception");
    }
}

soligen2010 avatar Jan 06 '23 16:01 soligen2010

This regressed in 542e301a. The reason is twofold:

  1. DynamicRecordCreator uses Reader.TryGetField when there is a header record, which swallows all exceptions (in particular, the BadDataException) from GetField https://github.com/JoshClose/CsvHelper/blob/9848510ed0fc0980bda687882c5ea123b7d7d3ce/src/CsvHelper/Expressions/DynamicRecordCreator.cs#L43
  2. The BadDataException was previously (also) throwing from the invocation of parser.Record in https://github.com/JoshClose/CsvHelper/blob/ee3f8f20f4450032e78b3fee26eb2d6ba873b372/src/CsvHelper/CsvReader.cs#L245 which no longer gets called

Possible solutions:

  1. Use Reader.GetField in DynamicRecordCreator instead of Reader.TryGetField (breaking change, would require setting config.MissingFieldFound to not throw in order to maintain behaviour)
  2. Use Reader.Parser[] directly in DynamicRecordCreator instead of Reader.TryGetField (breaking change, MissingFieldFound would no longer be fired)
  3. Add an (internal) GetField which has a catch-all around the MissingFieldFound invocation but allows the CsvParser exceptions to bubble up

(1) makes the most sense to me but is probably the most breaking. (3) is probably the least breaking but the most fudge-y

Rob-Hague avatar Jan 10 '23 07:01 Rob-Hague

Please fix this bug, or show me how to work around, hic

quangnd412 avatar Sep 07 '23 06:09 quangnd412