zed icon indicating copy to clipboard operation
zed copied to clipboard

csv reader should have a strings only option

Open mccanne opened this issue 4 years ago • 1 comments

From zio/csvio/reader.go:

// XXX This is a placeholder for an option that will allow one to convert
// all csv fields to strings and defer any type coercion presumably to 
// Z shapers.  Currently, this causes an import cycle because the csvio 
// Writer depends on fuse.  We should refactor this so whatever logic wants 
// to tack on a fuse operator happens outside of zio.  See issue #2315
//type ReaderOpts struct {
//	StringsOnly bool
//}

mccanne avatar Mar 09 '21 23:03 mccanne

I think I see a use case for this in the super era. Some SQL tutorials show examples involving zip codes, which might have leading zeroes. Consider test data zipcodes.csv:

zipcode
94107
01220

At the moment those values all get recognized as numbers, so the leading zero disappears.

$ super -version
Version: v1.18.0-92-g59e8fc0d

$ super -z -i csv zipcodes.csv 
{zipcode:94107.}
{zipcode:1220.}

I don't see a way at the moment to get past this initial loss of detail. That led me me to remember this issue.

FWIW, DuckDB's schema inference handles this case by turning it into a VARCHAR when the value with leading zero is present.

$ duckdb -c "SELECT * FROM 'zipcodes.csv';"
┌─────────┐
│ zipcode │
│ varchar │
├─────────┤
│ 94107   │
│ 01220   │
└─────────┘

Though if none of the values have a leading zero, it treats them as numbers.

$ cat zipcodes_no_leading.csv 
zipcode
94107
12020

$ duckdb -c "SELECT * FROM 'zipcodes_no_leading.csv'"
┌─────────┐
│ zipcode │
│  int64  │
├─────────┤
│   94107 │
│   12020 │
└─────────┘

I'm not advocating we mimic their behavior, but just pointing it out for comparison.

I see Google Sheets has a toggle that covers this as well.

image

philrz avatar Oct 31 '24 22:10 philrz