MultiAssayExperiment
MultiAssayExperiment copied to clipboard
Using intersectRows when different names are used for the same entity
I have one dataset of 16 S sequencing of intestinal biopsies and another one from the stools which end up into different OTUs. I can find to which taxa does each OTU belong to and in the phylogenetic analysis they are usually merged into a single object (phyloseq, metagenomeSeq) extending the rowData (I assume), or could be stored in rowData because the names of the OTUs (I have OTU_1, OTU_2, ...) aren't really meaningful. What is meaningful is the taxonomy I have in a matrix that is in those objects (phylo-class, MRexperiment-class).
See example output:
MR_i ## And MR_s is a similar object
## MRexperiment (storageMode: environment)
## assayData: 499 features, 103 samples
## element names: counts
## protocolData: none
## phenoData
## sampleNames: 5.B009 4.B008 ... 103.B104 (103 total)
## varLabels: Sample_Code Patient_ID ... ID (12 total)
## varMetadata: labelDescription
## featureData
## featureNames: OTU_1 OTU_10 ... OTU_998 (499 total)
## fvarLabels: Domain Phylum ... Species (7 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
(MAE <- MultiAssayExperiment(experiments = list("intestinal" = MR_i, "stools" = MR_s), colData = meta))
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] intestinal: MRexperiment with 499 rows and 103 columns
## [2] stools: MRexperiment with 535 rows and 103 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
When I build one of MAE object with them and I use intersectRows
I end up with those with the same name but different taxonomic classification.
intersectRows(MAE)
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] intestinal: MRexperiment with 235 rows and 103 columns
## [2] stools: MRexperiment with 235 rows and 103 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
c(head(rownames(b)[[1]]), tail(rownames(b)[[1]]))
## [1] "OTU_1" "OTU_10" "OTU_100" "OTU_101" "OTU_102" "OTU_103" "OTU_94" "OTU_95" "OTU_96" "OTU_97" "OTU_98" "OTU_99"
Instead the OTU_1073 from intestinal assay and the OTU_1037 from the stools assay are the same species.
Could intersectRows
use the rowData (or fvarLabels) of each experiment if available to reorder(?) and select the rows of the experiment?
Also if I have metagenomics and RNA-seq assays in the same object, I would like to tell intersectRows
which experiments to subset by row. I could be interested in just one Phylum and relate it to the other assays on the experiment.
The package looks great, thanks for the effort!
Hi Lluís, @llrs
Thank you for the report.
The assumption here is that all the objects in the ExperimentList
support a rowData
method.
It would be good to make use of this data perhaps we could add a byRowData
argument.
Regards,
Marcel
I tried building another object (SummarizedExperiment
) with the same data:
MultiAssayExperiment(list("intestinal" = SE_i, "stools" = SE_s))
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] intestinal: SummarizedExperiment with 532 rows and 178 columns
## [2] stools: SummarizedExperiment with 568 rows and 152 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
colData(mae)
## DataFrame with 330 rows and 0 columns
But then my problem is how to encode the colData, see this question in the support site.
It might be for another enhancement but using each SummarizedExperiment
's colData
to create a common colData
would simplify the creation of the MAE objects. It would have many caveats but maybe looking for common columns and creating a column for the row names of each sample in the SummarizedExperiment
would work.
@LiNk-NY I wonder if the enhancement should be more general than byRowData
- how about function signatures for subsetByRow
and subsetByColumn
, where the function is something that will be applied to each list element? Something like:
setMethod("subsetByRow", c("ExperimentList", "function"), function(x, y) {
sublist <- lapply(x, y)
x <- subsetByRow(x, sublist)
x
})
This could be used for subsetting by rowData
(although with more complicated user syntax than a more specific subsetByrowData
), but also for filtering by row means, variance, etc.
I think Martin @mtmorgan would say, you want to define a method for a class rather than a function.
And the desired functionality should either conform to the MultiAssayExperiment
API or
extend the class.
(Martin, feel free to chime in)
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
It's been a while but are there some updates?
I'm commenting to prevent the bot closing the issue
Hi Lluís, @llrs
What you describe seems to require a row map structure where subsets can be done
based on a third variable.
We don't have something like that planned in the immediate future although it is
an important problem to tackle. FWIW, we do have helper functions to homogenize rows
across experiments in TCGAutils (see symbolsToRanges
and mirToRanges
).
Perhaps you can write a function that will do this for you in terms of matching
and re-ordering OTU rows across experiments using a map. You could then use
a list
or List
or row names to subset.
If you are working with a consistent number of samples ('colnames') and rows,
it may also be worthwhile to look into data structures that make use of a
row graph representation such as LoomExperiment
.
Best regards, Marcel
Just discussed this with @LiNk-NY. This should provide a workable solution with minimal change:
- the
subsetByRow()
function should provide ani
argument that allows you specify which experiments will be subset, with the default being all.
Other helper functions subsetByRowData()
and intersectByRowData()
would also be useful. These would provide an additional argument for the column name of the rowData to use instead of column names. They would silently do nothing for any experiments that either 1) don't have rowData, or 2) don't have the specified colname in their rowData.