QFeatures icon indicating copy to clipboard operation
QFeatures copied to clipboard

Create QFeatures object from any sets of features/peptides/proteins tables

Open annaquaglieri16 opened this issue 3 years ago • 0 comments

Hi,

I'm sharing here a simple example that allows to exploit the data structure and functionalities of QFeatures starting from any pre-defined set of features/peptides/proteins. This is useful when one wants to import into R the summarised and aggregated output from any software, e.g. MaxQuant, PD etc... I wanted to test this as opposed to start from a PSMS table and then generating subsequent aggregations like explained in the QFeatures documentation.

Below I create the sample peptide and protein tables (I'll leave out the psms for simplicity). For both peptides and proteins I generate a matrix of "intensities" and the corresponding row data information with the respective peptides/proteins IDs. I generate a matrix of 10 peptides mapped to 3 proteins.

library(QFeatures)
wide_peptides <- matrix(c(rnorm(10), rnorm(10), rnorm(10)), nrow=10)
rownames(wide_peptides) <- paste0("Peptide",1:10)
colnames(wide_peptides) <- c("sample1", "sample2", "sample3")

rowdata_peptides <- DataFrame(PeptideID = rownames(wide_peptides), 
                              Protein.id = c(rep("Protein1", 2), rep("Protein2", 4), rep("Protein3", 4)))

wide_proteins <- matrix(c(rnorm(3), rnorm(3), rnorm(3)), nrow = 3)
rownames(wide_proteins) <- paste0("Protein",1:3)
colnames(wide_proteins) <- c("sample1", "sample2", "sample3")
rowdata_proteins <- DataFrame(ProteinID = rownames(wide_proteins))

We can now create separate SummarizedExperiment objects for the peptides and proteins. This is the minimal information to create a QFeatures object that contains 2 unlinked assays. They are unlinked, meaning that I cannot, for example, directly subset both assays by requesting all features coming from a particular protein id.

se1 <- SummarizedExperiment(wide_peptides, rowdata_peptides)
se2 <- SummarizedExperiment(wide_proteins, rowdata_proteins)

## Sample annotation (colData)
cd <- DataFrame(row.names = colnames(wide_proteins))

el <- list(peptides = se1, proteins = se2)
hl <- QFeatures(el, colData = cd)
hl
An instance of class QFeatures containing 2 assays:
 [1] peptides: SummarizedExperiment with 10 rows and 3 columns 
 [2] proteins: SummarizedExperiment with 3 rows and 3 columns 

However, we can easily create a link between the assays using the protein ids by exploiting the QFeatures::addAssayLink which is applied under the hood automatically when creating aggregations with QFeatures::aggregateFeatures.

hl_linked <- addAssayLink(hl,
             from = "peptides",
             to  = "proteins",
             varFrom = "Protein.id",
             varTo = "ProteinID")

Now the two assays are linked by the protein id and I can, for example, subset both assays simply by querying for one protein id and I can now use all the other functionalities of QFeatures

protein_example <- hl_linked["Protein1", , ]
protein_example
An instance of class QFeatures containing 2 assays:
 [1] peptides: SummarizedExperiment with 2 rows and 3 columns 
 [2] proteins: SummarizedExperiment with 1 rows and 3 columns 

Thanks a lot to @lgatto for providing the support and initial simple example to get me started with and to all the developer of the package!

Anna

annaquaglieri16 avatar Jun 17 '22 06:06 annaquaglieri16