Metabuli Feature request: obtain `report.tsv` for a subset of `classifications.tsv`

There are often situations where one has identified a list of reads as contamination, after having run classify once.

Now the user has to make new filtered fq files with the above reads removed and then rerun classify, which is computationally demanding.

In such situations it would be extremely useful for metabuli to be able to recompile report.tsv by reusing the existing classifications.tsv, but while specifying a list of reads that metabuli should ignore (or a list of reads to include).

Such a feature would make classify extremely flexible.

Thanks in advance!

Apr 01 '25 06:04 shiraz-shah

Thank you for good idea :) We are making a utility command that refines classification file based on what users want. hope you’re looking forward to it!

Apr 03 '25 03:04 borijoa

Great! I think that krakentools might actually work for the metabuli output too @jaebeom-kim ? https://github.com/jenniferlu717/KrakenTools

Apr 03 '25 04:04 martin-steinegger

Very nice! Will try that too then. Can't wait for the native implementation though!!

Apr 03 '25 09:04 shiraz-shah

Hi! Thank you for waiting! We made a new command classifiedRefiner to generate a subset of classifications and a report from it. It is designed to be more general. I hope it can handle your case. If not, please let us know!

May 29 '25 01:05 jaebeom-kim

Amazing! Can't wait to try this out!!

May 30 '25 06:05 shiraz-shah