FileIO.jl
FileIO.jl copied to clipboard
Save / Load with multiple data inputs / outputs
I'm frequently dealing with a really weird tabular data structure called a PCL file - basically, the first n rows contain various forms of metadata (first column for these rows is metadata name), then below that it has a sample x feature matrix. So for example, I might have my_data.pcl:
| age | 12 | 14 | 18 |
|---|---|---|---|
| gender | m | f | f |
| sampleID | a | b | c |
| feature1 | 0.2 | 0.4 | 0.3 |
| feature2 | 0.3 | 0.2 | 0.3 |
| feature3 | 0.1 | 0.6 | 0.7 |
Since most of the operations occur on the numerical table part, and storing this all in a single dataframe (or whatever) would generally lead to columns with type Any, what I'd like to be able to do is have load/save functions that make/take two iterable tables, that share the sampleID row, eg the two tables would be:
metadatadf:
| sampleID | a | b | c |
|---|---|---|---|
| age | 12 | 14 | 18 |
| gender | m | f | f |
featuredf:
| sampleID | a | b | c |
|---|---|---|---|
| feature1 | 0.2 | 0.4 | 0.3 |
| feature2 | 0.3 | 0.2 | 0.3 |
| feature3 | 0.1 | 0.6 | 0.7 |
And I'd like to be able to do something like:
(x, y) = load("my_data.pcl", id_row="sampleID")
metadatadf = DataFrame(x)
featuredf = DataFrame(y)
save("new_table.pcl", metadatadf, featuredf)
Can I get some guidance on whether this is possible / makes sense to use the FileIO framework for this?
If I'm not mistaken, arbitrary signatures should be supported in FileIO - just make sure that you accept those signates in your IO library! If FileIO misses to pass down the keyword args or additional arguments, please open an issue!
Great! An orthogonal question that just occurred to me - should I / can I piggyback off of the functions already present in CSVFiles.jl? Since this is a special case of a csv/tsv file type, and I'll probably want to make use of all of the keyword stuff available there?