EcoBase.jl
EcoBase.jl copied to clipboard
Synchronizing Table-like stuff
Purpose
In an effort to improve cross-ecosystem compatibility, it would be nice to make table-like data structures more interoperable. My view of the ecosystem is quite narrow - I'm really only aware of ComMatrix
from SpatialEcology.jl
and my own CommunityProfile
from Microbiome.jl
which took quite a bit of inspiration from the former. I also haven't used ComMatrix
in the last year or so as I was trying to iterate quickly in Microbiome.jl
.
cc @mkborregaard
Current advantages of CommunityProfile
- Tables.jl interface makes it easy to convert to
DataFrame
or write to CSV - Rows (
features
) and columns (samples
) can be indexed with numbers, strings, or regex - rows and column indexes are types (eg
Taxon
orGeneFunction
for features,MicrobiomeSample
for samples). This enables storing additional information (including metadata) inside the community table type
julia> using Microbiome
julia> s1 = MicrobiomeSample("sample1")
MicrobiomeSample("sample1", {})
julia> s2 = MicrobiomeSample("sample2");
julia> set!(s1, :type, "stool")
MicrobiomeSample("sample1", {:type = "stool"})
julia> set!(s1, :age, 37)
MicrobiomeSample("sample1", {:type = "stool", :age = 37})
julia> sp1 = Taxon("Bifidobacterium_longum", :species)
Taxon("Bifidobacterium_longum", :species)
julia> sp2 = taxon("s__Echerichia_coli")
Taxon("Echerichia_coli", :species)
julia> cm = CommunityProfile([0 1; 3 4], [sp1, sp2], [s1, s2])
CommunityProfile{Int64, Taxon, MicrobiomeSample} with 2 features in 2 samples
Feature names:
Bifidobacterium_longum, Echerichia_coli
Sample names:
sample1, sample2
julia> cm[r"Bifido", :]
CommunityProfile{Int64, Taxon, MicrobiomeSample} with 1 features in 2 samples
Feature names:
Bifidobacterium_longum
Sample names:
sample1, sample2
julia> metadata(cm)
2-element Vector{NamedTuple{(:sample, :type, :age), T} where T<:Tuple}:
(sample = "sample1", type = "stool", age = 37)
(sample = "sample2", type = missing, age = missing)
Current advantages of ComMatrix
(that I'm aware of)
- view machinery for cheap subsetting
- integration with spatial types
- plot recipes
- others?
Current incompatibilities
- names for columns / rows, the matrix data. This shouldn't matter too much, since I both fall back to EcoBase
thing*
andplace*
methods. If this is done right, one should be able to callfeaturenames
on aComMatrix
orspeciesnames
on aCommunityProfile
and get the same thing. - Internal representation. I think
ComMatrix
is a simple wrapper around a sparse matrix, I'm usingAxisArrays
andNamedDims
for a few things. But I actually think that this might be over-kill, since I mostly used it to take advantage of indexing, but then re-wrote the indexing in a way that doesn't rely on it so much. With a few tweaks toSpatialEcology
's views, I think I could drop that dependency. - Others?