gRodon2
gRodon2 copied to clipboard
large dataset produces error
Hello,
I have successfully run grodon on metagenome mode 1 and 2 on some samples.
However, some of our other data products are co-assemblies of a large number of metagenomes. When I try to run grodon in metagenome mode on single metagenomes from this large co-assemby, I get a fatal error from the growthPredict function: "XStringSet object is too big to be unlisted (would result in an XString object of length 2^31 or more)". I believe this is from Biostrings (https://github.com/Bioconductor/Biostrings/blob/master/src/XStringSet_class.c).
Is there anyway to avoid this error with large metagenomes?
Thanks
I guess my question would be what an average max growth rate of a co-assembly is actually telling you? I'm not really sure how one would interpret this tbh (especially if you are mapping reads back and using gRodon's abundance correction).
How many genes are we looking at? How many are annotated as ribosomal proteins? My first recommendation would be to subsample your genes, as gRodon's results are pretty robust to subsampling (see S16-S19 Fig here :https://doi.org/10.1101/2022.04.12.488109). Alternatively, you could keep all the ribosomal proteins and just subsample from the non-ribosomal proteins (I haven't played with this approach yet, but it might decrease noise).
It may be possible for me to write a version of gRodon that can handle this by fiddling with how I deal with the data internally (loading in chunks of data at a time and computing the CUB of each gene), but it would be a while before I got around to it and this is a pretty non-standard application so not super high on the priority list.
i suspect that this would be relevant looking at the metagenomes from a coassembly such as created by squeezemeta - in that instance you'd have their ORF_table that had a column for abundance per ORF for each metagenome from the overall coassembly which would be what one would want the growth rate of.