taxpasta icon indicating copy to clipboard operation
taxpasta copied to clipboard

Should we support the OPAL format as output?

Open Midnighter opened this issue 1 year ago • 10 comments

In order to use the OPAL tool for analysis and visualization, it might be useful to convert any supported profiler to that format.

Midnighter avatar Jul 02 '23 12:07 Midnighter

might be reproducing taxonkit profile2cami? https://bioinf.shenwei.me/taxonkit/usage/#profile2cami

mattheatley avatar Nov 17 '23 09:11 mattheatley

Some of those commands look really nice, didn't know of that tool thanks @mattheatley ! Agreed, maybe not necesary if it's supported elsewhere? Should consider if our output is compariable as input to profile2cami though

jfy133 avatar Nov 17 '23 09:11 jfy133

so you're pretty much already there with the standard taxpasta output. but instead of the two columns (taxid & count) you'd need to provide taxid & abundance (i.e. percentage) and then that's the input required by taxonkit. tbh it's probably more useful to also have the counts in general so maybe just provide the abundances as an extra output

mattheatley avatar Nov 17 '23 09:11 mattheatley

     b) Abundance (could be percentage, automatically detected or use -p/--percentage).

Raw counts are 'sequence' abunadance anyway, so maybe we are already there then?

jfy133 avatar Nov 17 '23 09:11 jfy133

But could be good to test if the two tools are comaptible, then we could update the docs to point people to taxonkit :)

jfy133 avatar Nov 17 '23 09:11 jfy133

I think maybe they are talking about proportion vs percentage and not counts but not totally sure

mattheatley avatar Nov 17 '23 09:11 mattheatley

I have been considering an option to report fractions instead of counts from taxpasta for quite some time. So it seems that small change would already make the output compatible with taxonkit's profile2cami.

Midnighter avatar Nov 17 '23 19:11 Midnighter

Maybe don’t do away with counts altogether though? I actually find it more useful to have them instead because you can’t convert backwards to counts from abundances. An additional output would be great though. At the moment I convert taxpasta outputs to abundances and then to cami so this would cut out a stage. But there can be rounding issues so maybe calculate them using decimals?

mattheatley avatar Nov 17 '23 20:11 mattheatley

By the way, @mattheatley, I don't know if this is clear enough from the documentation: Some of the original profiler output is actually given as fractions, which we multiply with a big number in order to obtain integers. So in those cases, it would be more faithful to the original result to only report fractions.

Midnighter avatar Nov 18 '23 16:11 Midnighter

One major issue atm is, that only leaf counts are supported by taxonkit https://github.com/shenwei356/taxonkit/issues/99#issuecomment-2203369757 If this is fixed I think it would work seamless with taxpasta.

paulzierep avatar Jul 02 '24 14:07 paulzierep