DataFrames.jl
DataFrames.jl copied to clipboard
Add "Assigning a named function to a column as its format"
Assigning a named function to a column as its format
I find the following feature from InMemoryDatasets very useful:
- By default, formatted values are used for operations like: displaying, sorting, grouping, joining,...
- Format evaluation is lazy
- Formats don't change the actual values
Please add it to this package, too. Usage example: Display a column in hex format.
I considered adding this feature but it would add another level of complication to the API. Could you please add more context about the use cases of format that you want?
Note that custom display formatting is currently meant to be handled by formatters https://ronisbr.github.io/PrettyTables.jl/stable/man/formatters/#Formatters.
Could you please add more context about the use cases of format that you want?
Nice that the formatters of pretty tables might already be solving one part of the problem! But how would you write formatted columns to a .csv file? Easy with InMemoryDatasets:
See: https://ufechner7.github.io/2022/08/07/exporting-formatted-datasets.html
In the moment I have to suggest InMemoryDatasets to my collegues, and I am hesitant to do that because the documentation of DataFrames is much better than the documentation of InMemoryDatasets. But if the package DataFrames does not have the features we need I cannot recommend using it.
For saving your data to CSV file you can do e.g.:
using CSV
function csv_formatter(col, value)
col == 1 && return @sprintf("%12.6f", value)
return ismissing(value) ? "--" : string(n, base=16, pad=2)
end
CSV.write("output.csv", ds, transform=csv_formatter)
If data is not very large, however, what I would typically do is (I understand this is not feasible for you because of the size of the data and you want a lazy solution?):
CSV.write("output.csv", select(df, :time => ByRow(round6), [:d1, :d2] .=> ByRow(hex), renamecols=false))
Of course I am aware that these solutions are not as convenient in your use case as InMemoryDatasets.jl format feature, but I just want to show how this can be achieved.
For saving your data to CSV file you can do e.g.:
using CSV function csv_formatter(col, value) col == 1 && return @sprintf("%12.6f", value) return ismissing(value) ? "--" : string(n, base=16, pad=2) end CSV.write("output.csv", ds, transform=csv_formatter)
If data is not very large, however, what I would typically do is (I understand this is not feasible for you because of the size of the data and you want a lazy solution?):
CSV.write("output.csv", select(df, :time => ByRow(round6), [:d1, :d2] .=> ByRow(hex), renamecols=false))
Of course I am aware that these solutions are not as convenient in your use case as InMemoryDatasets.jl format feature, but I just want to show how this can be achieved.
Thanks a lot for explaining how I can achieve the same result with DataFrames!