Metabuli icon indicating copy to clipboard operation
Metabuli copied to clipboard

Feature request: obtain `report.tsv` for a subset of `classifications.tsv`

Open shiraz-shah opened this issue 8 months ago • 3 comments

There are often situations where one has identified a list of reads as contamination, after having run classify once.

Now the user has to make new filtered fq files with the above reads removed and then rerun classify, which is computationally demanding.

In such situations it would be extremely useful for metabuli to be able to recompile report.tsv by reusing the existing classifications.tsv, but while specifying a list of reads that metabuli should ignore (or a list of reads to include).

Such a feature would make classify extremely flexible.

Thanks in advance!

shiraz-shah avatar Apr 01 '25 06:04 shiraz-shah

Thank you for good idea :) We are making a utility command that refines classification file based on what users want. hope you’re looking forward to it!

borijoa avatar Apr 03 '25 03:04 borijoa

Great! I think that krakentools might actually work for the metabuli output too @jaebeom-kim ? https://github.com/jenniferlu717/KrakenTools

martin-steinegger avatar Apr 03 '25 04:04 martin-steinegger

Very nice! Will try that too then. Can't wait for the native implementation though!!

shiraz-shah avatar Apr 03 '25 09:04 shiraz-shah

Hi! Thank you for waiting! We made a new command classifiedRefiner to generate a subset of classifications and a report from it. It is designed to be more general. I hope it can handle your case. If not, please let us know!

jaebeom-kim avatar May 29 '25 01:05 jaebeom-kim

Amazing! Can't wait to try this out!!

shiraz-shah avatar May 30 '25 06:05 shiraz-shah