SqueezeMeta
SqueezeMeta copied to clipboard
Subsetting a SQM Object by Sample Name
Hello! I am interested in subsetting my SQM Object by sample name. While I have found that some SQMTools functions such as plotFunctions() or plotTaxonomy() allows you to select the samples you would like to plot, I interested in creating a separate SQM Object that contains only a few specific samples.
Does anybody have any recommendations on a way to accomplish this?
I just committed 514e6fd, which adds a subsetSamples
function to do exactly what you want. This will be released together with SqueezeMeta v1.6, hopefully by the end of this week.
If you are eager to apply this with your own project, you can copy the function from the commit and modify it. It will complain a bit, since it is designed to work with some changes included in the next version of SQMtools (which you do not have yet). However, if you remove the lines in which it fails you should still arrive to a decent result.
Thank you so much, @fpusan ! This is extremely helpful.
Hi! I am trying to use the new subsetSamples function and am running into an issue. Here is my code:
`AD <- c("AD01","AD02","AD03")
subset <- subsetSamples(AD_archaea, AD)`
When I run this, I get the following error: Error in rowSums(subSQM$bins$abund[, samples, drop = F]) : 'x' must be an array of at least two dimensions
Any suggestions? Thank you!
Is AD_archaea
a complete project or a subset?
Are there any bins in AD_archaea$bins$abund
?
AD_archaea is a subset of a complete project. I am able to make heatplots and abundance plots without an issue using AD_archaea. I also made AD_archaea using an earlier version of SQMtools. Could this be the problem?
Potentially yes.
Can you run subsetSamples
in the original project?
Also, just want to check... Is this using SQMtools v1.6.0? Or did you copy the function from 514e6fd but are still using the previous version of SqueezeMeta/SQMtools?
I am using SQMtools v1.6.0.
I have successfully used the subsetSamples function on the original project. However, it now does not allow me to use the subsetTax function on the newly subsetted data. Here is my code:
load("AD_SqueezeMeta.RData") SD <- c("SD01","SD02") SD_samples <- subsetSamples(AD_SqueezeMeta, samples = SD, remove_missing = F) SD_samples_archaea = subsetTax(SD_samples, "superkingdom", "Archaea")
I get the following error: Error in SQM$contigs$bins[contigs, ] : incorrect number of dimensions Calls: subsetTax -> subsetContigs -> subsetORFs -> unique -> unlist Execution halted
Using the older version of SQMtools, I could use the subsetTax function successfully. Is it possible to subset a SQMobject by sample and then subset by taxa? Or vice versa?
What you are doing should be possible in both directions (subsetTax first, and then subsetSamples, or vice versa) so this is a bug, or at least some corner case that I had not considered...
I have the feeling that bin info is somehow disappearing after the first subset, resulting in an error when you do your second subset.
Any chance you could share your project with me via e.g. wetransfer?
I won't need the data
or temp
directories inside the project folder, if you remove them the project should weight much less...
This should be fixed now, the fix will make it to next release (v1.6.1) hopefully soon.