CSV.jl
CSV.jl copied to clipboard
Select behaviour with normalizenames
As shown on Slack:
julia> using CSV, DataFrames
julia> CSV.write("test.csv", DataFrame("x1" => rand(2), "x 2" => rand(2), "x 3" => rand(2)))
"test.csv"
julia> CSV.read("test.csv", DataFrame; select = ["x1", "x 2"], normalizenames = true)
2×1 DataFrame
Row │ x1
│ Float64
─────┼──────────
1 │ 0.613157
2 │ 0.859708
it looks like when normalizenames is used with select, it selects columns after normalization (i.e. doing select = ["x1", "x_2"] will work).
That's maybe not a bug but I think less than ideal as users need to anticipate the result of normalizenames to select the columns they want (and those results are sometimes unexpected like “Profit (net)“ turning into "Profit_net_" with a trailing underscore (maybe a separate issue?)
Maybe the solution is to just select from the union of normalised and unnormalised names, although that might have side effects I'm not considering?