sarek Human and mouse read disambiguation for PDX samples?

Description of feature

Hi,

Thank you for creating this awesome pipeline. I'm wondering if there are any modules within sarek that do disambiguation of mouse and human reads for PDX samples? For example like the disambiguate tool from Astra Zeneca:

https://github.com/AstraZeneca-NGS/disambiguate

Thanks for your time and help.

Best, Asher

Jun 30 '24 19:06 apsteinberg

Hey!

Is this related to a similar preprocessing step as requested here: https://github.com/nf-core/sarek/issues/1144 ?

So far we have restrained from expanding the scope of sarek even further to keep the pipeline maintainable. If it is a single tool I am slightly more inclined to have it added. What else would be necessary to make this work in the current workflow?

Jul 08 '24 13:07 FriederikeHanssen

Hi there,

This is related to the preprocessing step referenced in #1144.

Totally makes sense, I'm sure it takes a lot of time and effort to maintain. I was corresponding with @SPPearce about this on slack (link here), and he has written a subworkflow for this: https://nf-co.re/subworkflows/fastq_align_bamcmp_bwa. It relies on three tools: (i) bwa to align to both references, (ii) bamcmp to keep reads that align to the first genome, and (iii) sam tools to sort.

I haven't tested it out yet, but I think to integrate this for PDX or other samples with contamination this subworkflow would be run in lieu of the fastq_align_bwamem_mem2_dragmap_sentieon and bam_merge_index_samtools subworkflows. It could be an optional flag for these types of samples.

I would also be happy to try writing this in the next couple months, but I am thus far a nextflow novice :)

Thanks for your time and help!

Best, Asher

Jul 08 '24 14:07 apsteinberg

I do think we could do with this ability in some way, whether bamcmp or elsewhere. A suggestion was for a completely separate pipeline for this kind of filtering, generating bam files (or fastq) which then can go into many different pipelines

Jul 08 '24 18:07 SPPearce

Hello,

is there any update on this issue?

I work in a bioinformatics facility and we use sarek for standard genomic analyses on human samples. We have some projects with PDX samples and so far we have been using BBSplit before running sarek to filter out mouse reads. It would be really useful if the tool was incorporated to sarek. Even though I'm only an nf-core user and I've never developed a pipeline, I don't think it would be very complicated to incorporate it as a single tool with one or two parameters, similarly to how it is already implemented in the nfcore rnaseq pipeline, which we also use routinely.

I'm happy to try to incorporate it myself following the contribution guidelines if this is something that could be interesting for other users.

Thanks and best,

Alba

Mar 10 '25 15:03 albamasmalavila

I support this, it is a commonly asked for addition. I think bbsplit works on fastq files right? So it would be relatively straightforward to implement in the same way as in rnaseq.

Mar 10 '25 16:03 SPPearce

Hey! If it is a single tool I am inclined to agree to adding it. We are meeting Mondays to discuss ongoing dev work and talk about development in #sarek_dev. You are welcome to join if you want to give it a try :)

Mar 10 '25 16:03 FriederikeHanssen

Great, I'll give it a try. I just joined the #sarek_dev channel!

Mar 11 '25 14:03 albamasmalavila

Wanted to add that I will be excited to use this feature if included! BBsplit would be great, I have also used xengsort (https://gitlab.com/genomeinformatics/xengsort) which was recommended to me on the #sarek_dev channel a while back and it worked similarly well.

Cheers, Asher

PS Also happy to help out with any dev if needed, or at the very least I would be happy to beta test the feature :)

Mar 11 '25 14:03 apsteinberg

Are you all joining the hackathon? This would be a nice project I think. And we would be around to help review

Mar 11 '25 14:03 FriederikeHanssen

I would be happy to join if I'm available, and others from my team may be keen to join as well. When is the hackathon? Sorry if I've missed this on the slack channel.

Mar 14 '25 15:03 apsteinberg

https://nf-co.re/events/2025/hackathon-march-2025

Mar 17 '25 15:03 maxulysse

it's merged! 🚀

Oct 02 '25 13:10 FriederikeHanssen

This is wonderful, thanks Friederike!

Oct 02 '25 15:10 apsteinberg