squib icon indicating copy to clipboard operation
squib copied to clipboard

Squib.csv strange behavior with Unicode BOM

Open vtbassmatt opened this issue 5 years ago • 2 comments

(I'm not a Ruby person, so please forgive me if this is an expected behavior or otherwise widely known.)

Short version

Squib.csv exhibits really strange behavior in the face of a BOM (byte order marker) and/or CRLF (carriage return + linefeed) in CSV data. This is relevant since Excel's default is to write CSVs with these characters. Rows of data appear or fail to appear depending on which DataFrame methods you call!

Longer version, or how I got here

I used the --advanced project layout and immediately wanted to switch from XLSX to CSV-based data. Using Excel for Mac, I saved the default XLSX as CSV using whatever Excel's default was -- UTF-8 I think. I didn't realize it was going to use CRLF line endings + a Unicode BOM. The generated deck.rb immediately started giving me errors like this: NoMethodError: undefined method name' for #Squib::DataFrame:0x00007fd64ebdbce0`

Poking around in irb was curious. Sometimes the DataFrame thought it contained data, while other times it didn't.

Here's a slightly cleaned-up version of my session:

irb(main):001:0> require 'squib'
=> true

# data will be our original Excel file, data2 is from the CSV
irb(main):002:0> data = Squib.xlsx file: 'data/game.xlsx', sheet: 0
=> #<Squib::DataFrame:0x00007fbcf6b514a8 @hash={"Name"=>["Elf", "Dwarf"], "...
irb(main):003:0> data2 = Squib.csv file: 'data/game.csv'
=> #<Squib::DataFrame:0x00007fbcf7a73b98 @hash={"Name"=>["Elf", "Dwarf"], ...

# Both have 2 rows of data
irb(main):004:0> data.nrows
=> 2
irb(main):005:0> data2.nrows
=> 2

# not shown - the Excel-based version happily responds to .name and ['Name']
irb(main):006:0> data2.name
Traceback (most recent call last):
        4: from /usr/local/opt/ruby/bin/irb:23:in `<main>'
        3: from /usr/local/opt/ruby/bin/irb:23:in `load'
        2: from /usr/local/Cellar/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/irb-1.2.6/exe/irb:11:in `<top (required)>'
        1: from (irb):6
NoMethodError (undefined method `name' for #<Squib::DataFrame:0x00007fbcf7a73b98>)
Did you mean?  name
irb(main):007:0> data2['Name']
=> nil

# column doesn't exist?
irb(main):008:0> data2.col? 'name'
=> false

# but the data's in the JSON output...
irb(main):009:0> data2.to_json
=> "{\"Name\":[\"Elf\",\"Dwarf\"],\"ATK\":[3,2],\"DEF\":[2,3]}"

On a hunch, I replaced the CRLFs with LFs and removed the BOM. Everything worked after that.

vtbassmatt avatar Oct 22 '20 14:10 vtbassmatt

So... on a Mac, Excel saved it with CRLF? Interesting. And I'll look into how a BOM would get handled here.

andymeneely avatar Oct 26 '20 14:10 andymeneely

Yep - I was surprised too. The CRLF's don't seem to matter, as it turns out. And if you have Excel read a file without a BOM, it doesn't appear to insert a BOM. Only when I converted an XLSX to CSV did it give me trouble.

vtbassmatt avatar Oct 26 '20 15:10 vtbassmatt