taxpasta
taxpasta copied to clipboard
Should we support the OPAL format as output?
In order to use the OPAL tool for analysis and visualization, it might be useful to convert any supported profiler to that format.
might be reproducing taxonkit profile2cami? https://bioinf.shenwei.me/taxonkit/usage/#profile2cami
Some of those commands look really nice, didn't know of that tool thanks @mattheatley ! Agreed, maybe not necesary if it's supported elsewhere? Should consider if our output is compariable as input to profile2cami
though
so you're pretty much already there with the standard taxpasta output. but instead of the two columns (taxid & count) you'd need to provide taxid & abundance (i.e. percentage) and then that's the input required by taxonkit. tbh it's probably more useful to also have the counts in general so maybe just provide the abundances as an extra output
b) Abundance (could be percentage, automatically detected or use -p/--percentage).
Raw counts are 'sequence' abunadance anyway, so maybe we are already there then?
But could be good to test if the two tools are comaptible, then we could update the docs to point people to taxonkit :)
I think maybe they are talking about proportion vs percentage and not counts but not totally sure
I have been considering an option to report fractions instead of counts from taxpasta for quite some time. So it seems that small change would already make the output compatible with taxonkit's profile2cami.
Maybe don’t do away with counts altogether though? I actually find it more useful to have them instead because you can’t convert backwards to counts from abundances. An additional output would be great though. At the moment I convert taxpasta outputs to abundances and then to cami so this would cut out a stage. But there can be rounding issues so maybe calculate them using decimals?
By the way, @mattheatley, I don't know if this is clear enough from the documentation: Some of the original profiler output is actually given as fractions, which we multiply with a big number in order to obtain integers. So in those cases, it would be more faithful to the original result to only report fractions.
One major issue atm is, that only leaf counts are supported by taxonkit https://github.com/shenwei356/taxonkit/issues/99#issuecomment-2203369757 If this is fixed I think it would work seamless with taxpasta.