leviosam2 icon indicating copy to clipboard operation
leviosam2 copied to clipboard

Leviosam2 binary and python script questions

Open insilicool opened this issue 6 months ago • 1 comments

Hi,

Thank you for developing this useful program.

I am currently working on an SOP to hopefully incorporate leviosam2 into our pipeline suite. As I've worked through the SOP and processing of the lifting over of my HG002 T2T aligned cram to GRCh38 a few questions have arisen.

Firstly, I applied GATK4 mark duplicate (MD) to both the T2T cram and the liftover bam. From the metrics file generated by MD I noticed the following (note there are 4 libraries associated with our in-house sequencing of HG002 on Novaseq) :

T2T cram: METRICS CLASS picard.sam.DuplicationMetrics LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE

A00266_0582_1 585705 115629635 443592 686841 168884 13197245 2524985 0.114574 561015435 A00266_0582_2 574547 112611123 435282 672881 173147 16866895 3233453 0.150166 401483145 A00266_0582_3 631460 115005353 452444 737914 177839 12356716 2540103 0.107922 606175323 A00266_0582_4 596197 111627691 443584 701687 168764 14068430 3143703 0.126448 501826973

Leviosam2 realigned bam on GRCh38:

METRICS CLASS picard.sam.DuplicationMetrics LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE

A00266_0582_1 276002 90264193 4605 1909682 66237 10555011 1960704 0.117122 423705208 A00266_0582_2 263087 88032381 4565 1859872 68118 13619508 2548337 0.154866 300868501 A00266_0582_3 275094 89707870 4823 1903425 64234 9881384 1981844 0.11034 457404529 A00266_0582_4 263004 87111049 4706 1846461 63651 11317226 2482239 0.130086 376592529 Unknown Library 2407942 95321444 0 3107400 575302 12933336 1570779 0.136969 354835070

As you may notice the additional "Unknown Library" name which I believe is associated with the realignment of the deferred/liftover reads. Now my question is the python scripts argument "--read_group", if set, will it rename the "Unknown Library" only or generate one read group name with the lose the library namings?

Second question. I've noticed the -O bam argument is hardcoded in python scripts and their is no documentation in the help for leviosam2 lift with regards to -O. Is it possible to use cram, if so, can an argument be added to the python script to be able to select either sam, bam or cram?

Thanks!

Best, Rob E.

insilicool avatar Jan 16 '24 17:01 insilicool