DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

DimArray or DimStack as sink for CSV.read

Open tiemvanderdeure opened this issue 4 months ago • 7 comments

With https://github.com/rafaqz/DimensionalData.jl/pull/739 it is possible to provide DimStack or DimArray as a sink argument for CSV.read.

However, CSV doesn't pass any arguments to the sink - it just does Tables.CopiedColumns(CSV.File(source; kwargs...)) |> sink. So right now this isn't very useful - it's not possible to pass dimensions.

I think this could be pretty neat to have - maybe we can add a small extension to add this functionality. But I'm not sure what the API should look like (CSV.read is already very extensive). Thoughts?

tiemvanderdeure avatar Jul 04 '25 15:07 tiemvanderdeure

Yes, need to think about the best syntax. And how to get it. What are you thinking?

We could always add our own type like Sink and do CSV.read(table, Sink(DimStack; kw...))

rafaqz avatar Jul 05 '25 00:07 rafaqz

I was thinking that with extension we could dispatch on CSV.read(file, sink::AbstractDimArray; kw...) and then we could split the keywords. Similarly to how we split keywords if you call Raster on a datasource from RasterDataSources. Then we could make something like CSV.read(file, DimArray; dims = (X,Y), name = :col1) work.

tiemvanderdeure avatar Jul 05 '25 07:07 tiemvanderdeure

Yeah, probably thats best from a user perspective. Just slightly annoying to need an extension just for CSV.jl rather than for any table reader

rafaqz avatar Jul 05 '25 10:07 rafaqz

We could also opt for something similar to Rasters - DimArray("mystack.csv", (X,Y)). Which then requires a CSV extension.

So far we don't have I/O in this package, though. But maybe we can include it for simple formats such as csv

tiemvanderdeure avatar Jul 05 '25 10:07 tiemvanderdeure

Yeah, the sink syntax is better as it matches DataFrames.jl

rafaqz avatar Jul 05 '25 10:07 rafaqz

I just realized another option (maybe my favourite) is just to direct users to CSV.File. All keywords of CSV.read are being forwarded to that anyway and it is a table. So

file = CSV.File("dimstack.csv"; limit = 100)
DimArray(file, (X,Y,:category); name = :data2)

just works (found some bugs while testing this but now it's all fixed)

tiemvanderdeure avatar Jul 05 '25 10:07 tiemvanderdeure

Seems easier for us at least :)

rafaqz avatar Jul 05 '25 11:07 rafaqz