sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Perform a compatibility check between bed file specified at --targets and reference genome

Open GeertvanGeest opened this issue 2 years ago • 3 comments

Description of feature

If the bed file is incompatible with the reference genome (i.e. specifies regions outside the reference), the pipeline fails only after the alignment step at base recalibration. It would be nice to have a process that compares e.g. the fasta index (or sequence dictionary) with the bed file, and throws an error if the bed file is incompatible.

GeertvanGeest avatar Feb 03 '23 14:02 GeertvanGeest

I agree with you, and early fail would be better in that case

maxulysse avatar Feb 03 '23 14:02 maxulysse

See also https://github.com/nf-core/sarek/issues/97

FriederikeHanssen avatar Feb 08 '23 13:02 FriederikeHanssen

I think this would be a nice hackathon issue, it is fairly self-contained not too big. Do you know of any tool that can easily do this + possible other checks, like that it is sorted and so on? I briefly checked bedtools, but as far as I saw it is made for manipulating the bed file and not validating it in any form.

FriederikeHanssen avatar Mar 15 '23 09:03 FriederikeHanssen