machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Fix problems with DataFrame WriteCsv

Open dakersnar opened this issue 3 years ago • 1 comments

https://github.com/dotnet/machinelearning/issues/5647 is no longer present, but while investigating that issue I discovered some bugs with WriteCsv. This PR does the following:

  • Fixes bugs where text qualifiers (quotation marks) were not properly added to the CSV output by WriteCsv in the following cases:
    • There are separators in a row entry
    • There are separators in a header name
    • There are newlines in a row entry
    • There are newlines in a header name
  • Adds test cases both confirming https://github.com/dotnet/machinelearning/issues/5647 is solved and validating that the above bugs are fixed.

This is my first contribution to this repo so style criticism appreciated.

dakersnar avatar Aug 26 '22 00:08 dakersnar

Codecov Report

Merging #6303 (fe0adcd) into main (f117cd8) will increase coverage by 0.24%. The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6303      +/-   ##
==========================================
+ Coverage   68.39%   68.63%   +0.24%     
==========================================
  Files        1144     1171      +27     
  Lines      244991   247916    +2925     
  Branches    25411    25742     +331     
==========================================
+ Hits       167558   170157    +2599     
- Misses      70775    71015     +240     
- Partials     6658     6744      +86     
Flag Coverage Δ
Debug 68.63% <100.00%> (+0.24%) :arrow_up:
production 63.04% <100.00%> (+0.20%) :arrow_up:
test 89.09% <100.00%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Microsoft.Data.Analysis/DataFrame.IO.cs 81.15% <100.00%> (+1.92%) :arrow_up:
...Microsoft.Data.Analysis.Tests/DataFrame.IOTests.cs 98.80% <100.00%> (+0.22%) :arrow_up:
...Microsoft.ML.TorchSharp/Utils/DefaultDictionary.cs 0.00% <0.00%> (-30.62%) :arrow_down:
...luators/Metrics/MulticlassClassificationMetrics.cs 89.47% <0.00%> (-10.53%) :arrow_down:
...Microsoft.ML.AutoML.Tests/AutoMLExperimentTests.cs 92.38% <0.00%> (-7.62%) :arrow_down:
test/Microsoft.ML.Tests/TextClassificationTests.cs 93.75% <0.00%> (-6.25%) :arrow_down:
test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs 79.56% <0.00%> (-6.19%) :arrow_down:
src/Microsoft.ML.Maml/MAML.cs 24.36% <0.00%> (-2.54%) :arrow_down:
...eColumn.BinaryOperationImplementations.Exploded.cs 47.65% <0.00%> (-1.38%) :arrow_down:
test/Microsoft.ML.AutoML.Tests/DatasetUtil.cs 97.84% <0.00%> (-1.15%) :arrow_down:
... and 80 more

codecov[bot] avatar Aug 26 '22 02:08 codecov[bot]

@eerhardt Thanks for the feedback. Everything should be addressed now.

dakersnar avatar Sep 26 '22 17:09 dakersnar