FCSFiles.jl
FCSFiles.jl copied to clipboard
Redesign storage of data in FlowSample type
One short-coming of the current design is that it's annoying to take subsets of the data, i.e. gating the population on various parameters and making sure that the attributes are handled properly. I'm proposing to switch the current dictionary of arrays approach and replacing it with a relational storage type like TypedTables.jl. The hope would be then to allow the following
flowrun[[1,2,7,10]]
to get the data corresponding to the 1st, 2nd, 7th, and 10th cells where flowrun is of type FlowSample. Since this'll be a breaking change (especially with regards to column access), I'm interested in what people think before tackling this.
EDIT: I'm particularly interested in whether there are limitations in using a relational model for flow data.
Thoughts GigaSOM crowd? @laurentheirendt @exaexa @oHunewald, please tag anyone else I might've missed.
I think this would work for us, at least as long as FCS loading won't get significantly slower than what we currently have. GigaSOM.jl internally works with plain Matrix{Float64}'s, metadata have to be managed externally for simplicity&efficiency reasons.
Another possibly good choice would be DataFrames.jl (we originally used them in GigaSOM).
I think this would work for us
Great! ~~TypedTables.jl should be substantially faster than DataFrames.jl since it's not dynamic and has several performance-focused optimizations.~~ I'm not 100% sure about this any more given the lack of development on the TypedTables.jl side of things.