mosdepth icon indicating copy to clipboard operation
mosdepth copied to clipboard

Exact sorting of the output bed file?

Open Dario-Galanti opened this issue 2 years ago • 2 comments

When running mosdepth with a bed input file of regions, the regions get sorted in the output file. What is the exact sorting command used? I would like to reproduce the exact same sorting (or repress any sorting) so that I can paste extra columns (after the 4th one) which are in the input bed file but not reported in the output. Maybe a classic first column alphabetic and second numeric sorting? sort -k1,1 -k2,2n

Thank you very much for any help

Dario-Galanti avatar Mar 30 '22 17:03 Dario-Galanti

Yes, sorted by start numerically with each chromosome. https://github.com/brentp/mosdepth/blob/master/mosdepth.nim#L342

Then sorted by the order of chromosomes in the sam header (with SN) so you'd have to use that information to do the sorting. If you have the fasta and fai used to create the bam file, then you can use gsort like:

gsort $bed $fasta.fai > $sorted_bed

brentp avatar Mar 30 '22 17:03 brentp

Very useful, thanks very much! I didn't think I could resort the mosdepth output and check the md5 hash. In my case I could reproduce the sorting with sort -k1,1V -k2,2n

Dario-Galanti avatar Mar 31 '22 08:03 Dario-Galanti