SqueezeMeta icon indicating copy to clipboard operation
SqueezeMeta copied to clipboard

Subsetting a SQM Object by Sample Name

Open adec37 opened this issue 2 years ago • 9 comments

Hello! I am interested in subsetting my SQM Object by sample name. While I have found that some SQMTools functions such as plotFunctions() or plotTaxonomy() allows you to select the samples you would like to plot, I interested in creating a separate SQM Object that contains only a few specific samples.

Does anybody have any recommendations on a way to accomplish this?

adec37 avatar Sep 06 '22 17:09 adec37

I just committed 514e6fd, which adds a subsetSamples function to do exactly what you want. This will be released together with SqueezeMeta v1.6, hopefully by the end of this week. If you are eager to apply this with your own project, you can copy the function from the commit and modify it. It will complain a bit, since it is designed to work with some changes included in the next version of SQMtools (which you do not have yet). However, if you remove the lines in which it fails you should still arrive to a decent result.

fpusan avatar Sep 06 '22 19:09 fpusan

Thank you so much, @fpusan ! This is extremely helpful.

adec37 avatar Sep 06 '22 19:09 adec37

Hi! I am trying to use the new subsetSamples function and am running into an issue. Here is my code:

`AD <- c("AD01","AD02","AD03")

subset <- subsetSamples(AD_archaea, AD)`

When I run this, I get the following error: Error in rowSums(subSQM$bins$abund[, samples, drop = F]) : 'x' must be an array of at least two dimensions

Any suggestions? Thank you!

adec37 avatar Oct 04 '22 13:10 adec37

Is AD_archaea a complete project or a subset? Are there any bins in AD_archaea$bins$abund?

fpusan avatar Oct 05 '22 07:10 fpusan

AD_archaea is a subset of a complete project. I am able to make heatplots and abundance plots without an issue using AD_archaea. I also made AD_archaea using an earlier version of SQMtools. Could this be the problem?

adec37 avatar Oct 05 '22 11:10 adec37

Potentially yes. Can you run subsetSamples in the original project?

fpusan avatar Oct 05 '22 11:10 fpusan

Also, just want to check... Is this using SQMtools v1.6.0? Or did you copy the function from 514e6fd but are still using the previous version of SqueezeMeta/SQMtools?

fpusan avatar Oct 06 '22 08:10 fpusan

I am using SQMtools v1.6.0.

I have successfully used the subsetSamples function on the original project. However, it now does not allow me to use the subsetTax function on the newly subsetted data. Here is my code:

load("AD_SqueezeMeta.RData") SD <- c("SD01","SD02") SD_samples <- subsetSamples(AD_SqueezeMeta, samples = SD, remove_missing = F) SD_samples_archaea = subsetTax(SD_samples, "superkingdom", "Archaea")

I get the following error: Error in SQM$contigs$bins[contigs, ] : incorrect number of dimensions Calls: subsetTax -> subsetContigs -> subsetORFs -> unique -> unlist Execution halted

Using the older version of SQMtools, I could use the subsetTax function successfully. Is it possible to subset a SQMobject by sample and then subset by taxa? Or vice versa?

adec37 avatar Oct 06 '22 15:10 adec37

What you are doing should be possible in both directions (subsetTax first, and then subsetSamples, or vice versa) so this is a bug, or at least some corner case that I had not considered... I have the feeling that bin info is somehow disappearing after the first subset, resulting in an error when you do your second subset. Any chance you could share your project with me via e.g. wetransfer? I won't need the data or temp directories inside the project folder, if you remove them the project should weight much less...

fpusan avatar Oct 06 '22 17:10 fpusan

This should be fixed now, the fix will make it to next release (v1.6.1) hopefully soon.

fpusan avatar Oct 31 '22 11:10 fpusan