CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

Prettier (or custom) `normalizenames` behaviour?

Open nickrobinson251 opened this issue 3 years ago • 1 comments

(originally posted in #990)

the results [of normalizenames] are sometimes unexpected like “Profit (net)“ turning into "Profit_net_" with a trailing underscore

I agree that this normalization results in something ugly, and i'd been meaning to open an issue about it for ages.

I see this column naming pattern in quite a lot of data, e.g. with "Quanitity (unit)" like "Weight (kg)". It'd be nice to normalize these column names to valid identifiers so we can use getproperty syntax like file.Weight_kg but without also making the names a bit prettier / more intuitive for users e.g. Weight_kg not Weight_kg_.

One option would be to change the CSV.normalizename function... e.g. just strip trailing _ (which is what we map anything that's not Base.is_id_char to). I suppose this would be breaking. It'd also mean more names would map to the same thing, but i'm not sure that's an issue in reality.

Another option could be allowing users to pass normalizenames = my_func with signature my_func(::String) -> Symbol

nickrobinson251 avatar Mar 17 '22 18:03 nickrobinson251

Yeah, I like the idea of doing more "prettier" things in our normalization pass for now; we could special case characters like ( and just remove them. We also have checks to ensure that if two names get normalized to the same, we append an id number.

quinnj avatar Mar 18 '22 20:03 quinnj