nushell icon indicating copy to clipboard operation
nushell copied to clipboard

`from csv`: Disable type inference for quoted values

Open Phantomwise opened this issue 1 month ago • 6 comments

Basics

  • [x] I have done a basic search through the issue tracker to find similar or related issues.
  • [x] I have made myself familiar with the available features of Nushell for the particular area this enhancement request touches.

Related problem

Numeric values in CSV files always get converted by type inference, even when they are quoted, which causes problems for numeric identifiers which should be strings because the leading zeros matter. The only way that I found to keep them as strings is to use --no-infer to disable type inference, but that applies to everything in the file, and therefore requires manually converting other columns back to numbers, which is not very convenient.

It would be really nice to be able to keep type inference except for quoted values, either as an option or by default.

Describe the solution you'd like

Either:

  • Add a --no-infer-quoted flag to disable type inference only for quoted values

Or:

  • Disable type inference for quoted values by default (if you're quoting numbers, you probably want them to be strings). But in that case you'd probably want a --infer-quoted flag to force type inference in quoted values if you use CSV files that quote everything by default.

Describe alternatives you've considered

No response

Additional context and details

Use case: Files with both numeric values that should be numbers, and numeric values that should be strings like barcodes and other identifiers. Especially when all the identifiers have leading zeros and are not of the same length.

Phantomwise avatar Nov 06 '25 11:11 Phantomwise

What you're wanting to do may be possible but I wouldn't call the flag --no-infer-quoted. Maybe something like --keep-leading-zeros or something to that effect. Also disabling types for quoted values seems wrong since it's pretty common for csv files to have quoted fields.

This is where this all happens in the code if someone wanted to investigate further. https://github.com/nushell/nushell/blob/f1265c5828d8a12863fefb98479c133255fed5bf/crates/nu-command/src/formats/from/delimited.rs#L62-L73

The real question is will the csv crate even allow you to do this.

fdncred avatar Nov 06 '25 12:11 fdncred

Thanks for the answer!

What you're wanting to do may be possible but I wouldn't call the flag --no-infer-quoted. Maybe something like --keep-leading-zeros or something to that effect.

If type inference stays, and the identifiers are converted to integers without trimming the leading zeroes, that still leaves the problem of the sort order, which will be numerical instead of lexicographical. That doesn't matter if the identifiers all have the same number of characters, but sometimes they don't... Which is why I was looking for a way to force a field to be a string, since that's how barcodes and similar identifiers are usually handled.

However, if the leading zeroes are kept, it would allow to convert the column back into a string with into string without losing part of the identifier, so it would work as a solution.

Also disabling types for quoted values seems wrong since it's pretty common for csv files to have quoted fields.

I thought you either quoted everything, or quoted only fields you want to force to be strings? Isn't the latter widespread use which would justify an option to support it?

Phantomwise avatar Nov 07 '25 09:11 Phantomwise

I thought you either quoted everything, or quoted only fields you want to force to be strings? Isn't the latter widespread use which would justify an option to support it?

From my experience in business quotes can be on every field regardless of whether they're needed or not. However, they're usually used when a particular field has a comma so that the comma won't be used as a field delimiter.

Feel free to prove me wrong by analyzing the csv crate and providing your evidence.

fdncred avatar Nov 07 '25 12:11 fdncred

I'm not saying you're wrong, that was an honest question because I haven't been using CSV for very long so I might have wrong ideas about what's commonly done.

However, they're usually used when a particular field has a comma so that the comma won't be used as a field delimiter.

I see your point... If you are quoting fields that contain commas, then disabling type inference for quoted values would cause problems if the fields contain anything but text, like floats using commas as separators like in French and other locales. So making it a default behavior does seem a really bad idea in that light.

So that would leave either an optional flag to disable type inference on quoted fields, or a flag to keep leading zeros, which can be used to convert the fields back to strings.

Phantomwise avatar Nov 07 '25 12:11 Phantomwise

According to the RFC, this is what double quotes are used for.

Image

fdncred avatar Nov 07 '25 13:11 fdncred

Can we scope the description and title to the from csv command? Otherwise the title of this issue is completely misguiding.

sholderbach avatar Nov 11 '25 15:11 sholderbach