`from csv`: Disable type inference for quoted values
Basics
- [x] I have done a basic search through the issue tracker to find similar or related issues.
- [x] I have made myself familiar with the available features of Nushell for the particular area this enhancement request touches.
Related problem
Numeric values in CSV files always get converted by type inference, even when they are quoted, which causes problems for numeric identifiers which should be strings because the leading zeros matter. The only way that I found to keep them as strings is to use --no-infer to disable type inference, but that applies to everything in the file, and therefore requires manually converting other columns back to numbers, which is not very convenient.
It would be really nice to be able to keep type inference except for quoted values, either as an option or by default.
Describe the solution you'd like
Either:
- Add a
--no-infer-quotedflag to disable type inference only for quoted values
Or:
- Disable type inference for quoted values by default (if you're quoting numbers, you probably want them to be strings). But in that case you'd probably want a
--infer-quotedflag to force type inference in quoted values if you use CSV files that quote everything by default.
Describe alternatives you've considered
No response
Additional context and details
Use case: Files with both numeric values that should be numbers, and numeric values that should be strings like barcodes and other identifiers. Especially when all the identifiers have leading zeros and are not of the same length.
What you're wanting to do may be possible but I wouldn't call the flag --no-infer-quoted. Maybe something like --keep-leading-zeros or something to that effect. Also disabling types for quoted values seems wrong since it's pretty common for csv files to have quoted fields.
This is where this all happens in the code if someone wanted to investigate further. https://github.com/nushell/nushell/blob/f1265c5828d8a12863fefb98479c133255fed5bf/crates/nu-command/src/formats/from/delimited.rs#L62-L73
The real question is will the csv crate even allow you to do this.
Thanks for the answer!
What you're wanting to do may be possible but I wouldn't call the flag --no-infer-quoted. Maybe something like --keep-leading-zeros or something to that effect.
If type inference stays, and the identifiers are converted to integers without trimming the leading zeroes, that still leaves the problem of the sort order, which will be numerical instead of lexicographical. That doesn't matter if the identifiers all have the same number of characters, but sometimes they don't... Which is why I was looking for a way to force a field to be a string, since that's how barcodes and similar identifiers are usually handled.
However, if the leading zeroes are kept, it would allow to convert the column back into a string with into string without losing part of the identifier, so it would work as a solution.
Also disabling types for quoted values seems wrong since it's pretty common for csv files to have quoted fields.
I thought you either quoted everything, or quoted only fields you want to force to be strings? Isn't the latter widespread use which would justify an option to support it?
I thought you either quoted everything, or quoted only fields you want to force to be strings? Isn't the latter widespread use which would justify an option to support it?
From my experience in business quotes can be on every field regardless of whether they're needed or not. However, they're usually used when a particular field has a comma so that the comma won't be used as a field delimiter.
Feel free to prove me wrong by analyzing the csv crate and providing your evidence.
I'm not saying you're wrong, that was an honest question because I haven't been using CSV for very long so I might have wrong ideas about what's commonly done.
However, they're usually used when a particular field has a comma so that the comma won't be used as a field delimiter.
I see your point... If you are quoting fields that contain commas, then disabling type inference for quoted values would cause problems if the fields contain anything but text, like floats using commas as separators like in French and other locales. So making it a default behavior does seem a really bad idea in that light.
So that would leave either an optional flag to disable type inference on quoted fields, or a flag to keep leading zeros, which can be used to convert the fields back to strings.
Can we scope the description and title to the from csv command? Otherwise the title of this issue is completely misguiding.