hippie_csv
hippie_csv copied to clipboard
:v: Tolerant, liberal CSV parsing
HippieCSV
Ruby's CSV is great. It complies with the proposed CSV spec
pretty well. If you pass its methods bad or non-compliant CSVs, it’ll rightfully
and loudly complain. It’s great 👍
Except…if you want to be able to deal with files from the real world. At Intercom, we’ve seen lots of problematic CSVs from customers importing data to our system. You may want to support such cases. You may not always know the delimiter, nor the chosen quote character, in advance.
HippieCSV is a ridiculously tolerant and liberal parser which aims to yield as much usable data as possible out of such real-world CSVs.
Installation
Add this line to your application's Gemfile:
gem 'hippie_csv'
And then execute:
$ bundle
Or install it yourself as:
$ gem install hippie_csv
Usage
Exposes three public methods:
.reada file path to an array. Reads from the file all at once, building the whole CSV object in memory..parsean in-memory string to an array..streamfrom a file path and parse line by line, calling a given block on each row.
Note: Processing large files using read or parse is a memory intensive operation. Use stream for parsing a CSV file line by line from the file to save memory. This method will use less memory but take longer, as we run each line through parse.
require 'hippie_csv'
HippieCSV.read("path/to/data.csv")
HippieCSV.stream("path/to/data.csv") do |row|
# use row here...
end
HippieCSV.parse(csv_string)
Features
- Deduces the delimiter (supports
,,;, and\t) - Deduces the quote character (supports
',", and|) - Forgives backslash escaped quotes in quoted CSVs
- Forgives invalid newlines in quoted CSVs
- Heals many encoding issues (and aggressively forces UTF-8)
- Deals with many miscellaneous malformed types of CSVs
- Works when a byte order mark is present
Contributing
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
- Fork the project.
- Start a feature/bugfix branch.
- Commit and push until you are happy with your contribution.
- Make sure to add tests for it. This is important so we don't break it in a future version unintentionally.
- Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so we can cherry-pick around it.