CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

Select behaviour with normalizenames

Open nilshg opened this issue 2 years ago • 0 comments

As shown on Slack:

 julia> using CSV, DataFrames
 
 julia> CSV.write("test.csv", DataFrame("x1" => rand(2), "x 2" => rand(2), "x 3" => rand(2)))
 "test.csv"
 
 julia> CSV.read("test.csv", DataFrame; select = ["x1", "x 2"], normalizenames = true)
 2×1 DataFrame
  Row │ x1       
      │ Float64  
 ─────┼──────────
    1 │ 0.613157
    2 │ 0.859708

it looks like when normalizenames is used with select, it selects columns after normalization (i.e. doing select = ["x1", "x_2"] will work).

That's maybe not a bug but I think less than ideal as users need to anticipate the result of normalizenames to select the columns they want (and those results are sometimes unexpected like “Profit (net)“ turning into "Profit_net_" with a trailing underscore (maybe a separate issue?)

Maybe the solution is to just select from the union of normalised and unnormalised names, although that might have side effects I'm not considering?

nilshg avatar Mar 09 '22 18:03 nilshg