DataFrames.jl
DataFrames.jl copied to clipboard
Autogenerated suffixes can clash
I was bit by this issue in a case that looked like this:
julia> df = DataFrame(x=1:3);
julia> select(df, :x => (x->2x), :x => (x->3x))
ERROR: ArgumentError: duplicate output column name: :x_function
I guess this works as documented:
the generated name is created by concatenating source column name and function name by default (see examples below).
But it would be nice if name auto-generation was smarter, to guarantee unique names.
Another case that is maybe worse as it overwrites data:
julia> f(x) = 2x;
julia> df = DataFrame(x=1:3, x_f=0)
3×2 DataFrame
Row │ x x_f
│ Int64 Int64
─────┼──────────────
1 │ 1 0
2 │ 2 0
3 │ 3 0
julia> transform!(df, :x => f)
3×2 DataFrame
Row │ x x_f
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 6
Finally, a case that is a bit contrived:
julia> g(x) = 3x;
julia> f_g(x) = 1;
julia> df = DataFrame(x=1:3, x_f=0);
julia> transform(df, :x => f_g, :x_f => g)
ERROR: ArgumentError: duplicate output column name: :x_f_g
All this works as expected. Still we may re-consider adding makeunique
kwarg to these functions that is why I keep it open.
The general idea is that it is safer in production code to throw error than silently modify generated (or passed by the user explicitly) column name.