DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Add "Assigning a named function to a column as its format"

Open ufechner7 opened this issue 1 year ago • 4 comments

Assigning a named function to a column as its format

I find the following feature from InMemoryDatasets very useful:

  • By default, formatted values are used for operations like: displaying, sorting, grouping, joining,...
  • Format evaluation is lazy
  • Formats don't change the actual values

Please add it to this package, too. Usage example: Display a column in hex format.

ufechner7 avatar Aug 09 '22 04:08 ufechner7

I considered adding this feature but it would add another level of complication to the API. Could you please add more context about the use cases of format that you want?

Note that custom display formatting is currently meant to be handled by formatters https://ronisbr.github.io/PrettyTables.jl/stable/man/formatters/#Formatters.

bkamins avatar Aug 09 '22 07:08 bkamins

Could you please add more context about the use cases of format that you want?

Nice that the formatters of pretty tables might already be solving one part of the problem! But how would you write formatted columns to a .csv file? Easy with InMemoryDatasets:

See: https://ufechner7.github.io/2022/08/07/exporting-formatted-datasets.html

In the moment I have to suggest InMemoryDatasets to my collegues, and I am hesitant to do that because the documentation of DataFrames is much better than the documentation of InMemoryDatasets. But if the package DataFrames does not have the features we need I cannot recommend using it.

ufechner7 avatar Aug 09 '22 16:08 ufechner7

For saving your data to CSV file you can do e.g.:

using CSV
function csv_formatter(col, value)
    col == 1 && return @sprintf("%12.6f", value)
    return ismissing(value) ? "--" : string(n, base=16, pad=2)
end
CSV.write("output.csv", ds, transform=csv_formatter)

If data is not very large, however, what I would typically do is (I understand this is not feasible for you because of the size of the data and you want a lazy solution?):

CSV.write("output.csv", select(df, :time => ByRow(round6), [:d1, :d2] .=> ByRow(hex), renamecols=false))

Of course I am aware that these solutions are not as convenient in your use case as InMemoryDatasets.jl format feature, but I just want to show how this can be achieved.

bkamins avatar Aug 09 '22 22:08 bkamins

For saving your data to CSV file you can do e.g.:

using CSV
function csv_formatter(col, value)
    col == 1 && return @sprintf("%12.6f", value)
    return ismissing(value) ? "--" : string(n, base=16, pad=2)
end
CSV.write("output.csv", ds, transform=csv_formatter)

If data is not very large, however, what I would typically do is (I understand this is not feasible for you because of the size of the data and you want a lazy solution?):

CSV.write("output.csv", select(df, :time => ByRow(round6), [:d1, :d2] .=> ByRow(hex), renamecols=false))

Of course I am aware that these solutions are not as convenient in your use case as InMemoryDatasets.jl format feature, but I just want to show how this can be achieved.

Thanks a lot for explaining how I can achieve the same result with DataFrames!

ufechner7 avatar Aug 10 '22 19:08 ufechner7