simdcsv icon indicating copy to clipboard operation
simdcsv copied to clipboard

Look at the competition: csv-parsers-comparison

Open lemire opened this issue 5 years ago • 6 comments

https://github.com/uniVocity/csv-parsers-comparison

lemire avatar Jul 24 '19 21:07 lemire

The following link might be interesting as well: https://bitbucket.org/ewanhiggs/csv-game/src/master/

karimhm avatar May 11 '20 03:05 karimhm

Is there some benchmark numbers for this project. This is very interesting. I will definitely consider adapting it to FishStore when this library becomes a bit more stable.

dongx-psu avatar Feb 04 '21 00:02 dongx-psu

Hello, I'll submit my own library for comparison: Sylvan.Data.Csv. I believe it is currently the fastest CSV parser in the .NET ecosystem. I recently added a SIMD fast-path that processes unquoted fields, and falls back to the single data path when a quoted field is encountered. This was my first exposure to SIMD, so I'm sure there's room for improvement in that logic, but it was a pretty significant improvement over the non-SIMD code. The library is encoding agnostic, so I'm it could probably be made even faster if it had a code path specialized for processing UTF-8 bytes instead of .NET chars, but I don't want to compromise the ergonomics of the API to do so. On my machine it processes ~1GB/sec of UTF-8 encoded CSV data, when just counting rows/fields.

MarkPflug avatar Jul 27 '21 16:07 MarkPflug

Thanks @MarkPflug

lemire avatar Jul 27 '21 17:07 lemire

I'll add one I maintain: https://github.com/liquidaty/zsv

I'm sure with the intellectual firepower on this repo, you can beat zsv, or perhaps someone already has, but I haven't yet seen that, other than from parsers that are unable to handle real-world CSV variants in the same manner as Excel, which as a practical matter is (imho) the best de facto standard for CSV parsing.

liquidaty avatar Nov 08 '23 19:11 liquidaty

https://github.com/nietras/Sep is my project, it has detailed Benchmarks and uses csFastFloat. Sep does not support all features that some csv parsers do but has an api tailored to machine learning use cases. I'm working on unquoting/unescaping which is all that is missing.

nietras avatar Nov 08 '23 20:11 nietras