NamedArrays.jl icon indicating copy to clipboard operation
NamedArrays.jl copied to clipboard

Reading from file to NamedArray

Open Mastomaki opened this issue 3 years ago • 4 comments

Documentation should be added about the best ways to read from text file to NamedArray. My current plan is to first read a DataFrame via CSV.jl and then convert it using the function provided by dietercastel.

Mastomaki avatar May 11 '21 07:05 Mastomaki

For missing values I edit the original function of dietercastel as follows:

function convert(t::Type{NamedArray}, df::DataFrame; valueCol = :Values)
   newdimnames = propertynames(df)
   deleteat!(newdimnames,findfirst(x->x==valueCol,newdimnames))
   names = map(dn->unique(df[!,dn]),newdimnames)
   lengths = map(length,names)

    newna = NamedArray( Array{Union{Missing, Float64}}(missing, lengths...), tuple(names...), tuple(newdimnames...))
    for row in eachrow(df)
        a = [row[col] for col in newdimnames]
        newna[a...] = row[valueCol]
    end
   return newna
end

However, the datatype of the named array should be set according to the original dataframe.

Mastomaki avatar May 11 '21 08:05 Mastomaki

Yes, documentation. I have to study how that works. Do you know of a recommended and hosted platform for that?

davidavdav avatar May 24 '21 09:05 davidavdav

Not really. I believe the documentation of registered packages appears in https://juliapackages.com/ if it is present in the Github repository. And documenter.jl can be used to make documentation.

Mastomaki avatar May 24 '21 09:05 Mastomaki

Yes, documentation. I have to study how that works. Do you know of a recommended and hosted platform for that?

I think it is not necessary to master documenter.jl to write a formal, perfect documentation. If the usage of conversion between NamedArray and DataFrame can be added to the ReadMe file of this repository, that is good enough for now for people to learn it.

I think your package is very important for Julia to attract data science users from Python Pandas and R, where data frame and matrix can be easily converted to each other and transposed without losing row names or column names. Thanks a lot for your work.

sciencepeak avatar Aug 12 '21 06:08 sciencepeak