EcoBase.jl icon indicating copy to clipboard operation
EcoBase.jl copied to clipboard

Synchronizing Table-like stuff

Open kescobo opened this issue 2 years ago • 4 comments

Purpose

In an effort to improve cross-ecosystem compatibility, it would be nice to make table-like data structures more interoperable. My view of the ecosystem is quite narrow - I'm really only aware of ComMatrix from SpatialEcology.jl and my own CommunityProfile from Microbiome.jl which took quite a bit of inspiration from the former. I also haven't used ComMatrix in the last year or so as I was trying to iterate quickly in Microbiome.jl.

cc @mkborregaard

Current advantages of CommunityProfile

  • Tables.jl interface makes it easy to convert to DataFrame or write to CSV
  • Rows (features) and columns (samples) can be indexed with numbers, strings, or regex
  • rows and column indexes are types (eg Taxon or GeneFunction for features, MicrobiomeSample for samples). This enables storing additional information (including metadata) inside the community table type
julia> using Microbiome

julia> s1 = MicrobiomeSample("sample1")
MicrobiomeSample("sample1", {})

julia> s2 = MicrobiomeSample("sample2");

julia> set!(s1, :type, "stool")
MicrobiomeSample("sample1", {:type = "stool"})

julia> set!(s1, :age, 37)
MicrobiomeSample("sample1", {:type = "stool", :age = 37})

julia> sp1 = Taxon("Bifidobacterium_longum", :species)
Taxon("Bifidobacterium_longum", :species)

julia> sp2 = taxon("s__Echerichia_coli")
Taxon("Echerichia_coli", :species)

julia> cm = CommunityProfile([0 1; 3 4], [sp1, sp2], [s1, s2])
CommunityProfile{Int64, Taxon, MicrobiomeSample} with 2 features in 2 samples

Feature names:
Bifidobacterium_longum, Echerichia_coli

Sample names:
sample1, sample2



julia> cm[r"Bifido", :]
CommunityProfile{Int64, Taxon, MicrobiomeSample} with 1 features in 2 samples

Feature names:
Bifidobacterium_longum

Sample names:
sample1, sample2



julia> metadata(cm)
2-element Vector{NamedTuple{(:sample, :type, :age), T} where T<:Tuple}:
 (sample = "sample1", type = "stool", age = 37)
 (sample = "sample2", type = missing, age = missing)

Current advantages of ComMatrix (that I'm aware of)

  • view machinery for cheap subsetting
  • integration with spatial types
  • plot recipes
  • others?

Current incompatibilities

  • names for columns / rows, the matrix data. This shouldn't matter too much, since I both fall back to EcoBase thing* and place* methods. If this is done right, one should be able to call featurenames on a ComMatrix or speciesnames on a CommunityProfile and get the same thing.
  • Internal representation. I think ComMatrix is a simple wrapper around a sparse matrix, I'm using AxisArrays and NamedDims for a few things. But I actually think that this might be over-kill, since I mostly used it to take advantage of indexing, but then re-wrote the indexing in a way that doesn't rely on it so much. With a few tweaks to SpatialEcology's views, I think I could drop that dependency.
  • Others?

kescobo avatar Oct 20 '21 19:10 kescobo