zUMIs icon indicating copy to clipboard operation
zUMIs copied to clipboard

Using demultiplexed BAM files output from zUMIs with Picard Tools

Open seifudd opened this issue 9 months ago • 1 comments

Hi,

I am trying to use the demultiplexed BAM output from zUMIs with Picard Tools but, it does not seem to be working.

Below are a few lines from a demultiplexed BAM file (one sample) output from zUMIs:

A00267:423:HFMHMDRX3:1:2101:1000:34663  99      19      48452905        255     88M     =       48453094        277     GCTGTTCGTGCACCAGGGCGAGACCGAGCTGAAGGAGCTGCACT
GGCACCCGCAGTGCCCAGGGCTCCTGGTCAGCACGGCGCTGTCA    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:174    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCCGCANCAGCTCTGGATCAGAGC   XS:Z:Assigned2  XN:i:1  XT:Z:ENSG00000105447
A00267:423:HFMHMDRX3:1:2101:1000:34663  147     19      48453094        255     88M     =       48452905        -277    GGTTCATTCAGGTCTGTTGACTGAGACTGGCCGGCCTGTGGGCT
GCCGTGATGGATTCTGTTTGACGTATTGTTCTCTAGAAGGCCTG    FFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:174    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCCGCANCAGCTCTGGATCAGAGC   XS:Z:Assigned2  XN:i:1  XT:Z:ENSG00000105447
A00267:423:HFMHMDRX3:1:2101:1000:35978  83      2       105342985       255     88M     =       105339692       -3381   CCAGTAATGCCTTTAGAAAATTATCAAATTCCTCTTCGAGTGTT
TCACCCCTAATTTTGTCTTCCAATTTGCCTGTGAACAATAAAAC    FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:175    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:TTATTGTGTTCCCGAAGAATAGAT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000135974
A00267:423:HFMHMDRX3:1:2101:1000:35978  163     2       105339692       255     58M3098N30M     =       105342985       3381    GGGGGAAAATGATGGAAAAGAAAAGAGAACAACATG
AGATTAAAAATGAGACTAAAAGGAGTAGCACTGTAGATGGGTTAAGGAAAAG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F:FFF        NH:i:1  HI:i
:1      AS:i:175        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:TTATTGTGTTCCCGAAGAATAGAT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000135974
A00267:423:HFMHMDRX3:1:2101:1000:36166  83      16      69718137        255     88M     =       69713114        -5111   GGTCTGCGGCTTCCAGCTTCTTTTGTTCAGCCACAATATCTGGG
CTCAGATGGCCTTCTTTATAAGCCAGAACAGACTCGGCAGGATA    :FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:175    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GCGAACTTTCAGTGGTGATGGAAA   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000181019
A00267:423:HFMHMDRX3:1:2101:1000:36166  163     16      69713114        255     16M1834N72M     =       69718137        5111    GCACTGCCTTCTTACTCCGGAAGGGTCCTTTGTCAT
ACATGGCAGCGTAAGTGTAAGCAAACTCTCCTATGAACACTCGCTCAAACCA    FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,,        NH:i:1  HI:i
:1      AS:i:175        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GCGAACTTTCAGTGGTGATGGAAA   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000181019
A00267:423:HFMHMDRX3:1:2101:1009:15107  99      10      26501106        255     6M2085N78M8472N4M       =       26511819        10801   TCTCAGGAAGAGGAAGAAGCCCAAGCCA
AGGCTGATAAAATTAAGCTGGCGCTGGAAAAACTGAAGGAGGCCAAGGTTAAGAAGCTCG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF        NH:i
:1      HI:i:1  AS:i:177        nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCTGAACCTCTCCAAAAAACCTCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000077420
A00267:423:HFMHMDRX3:1:2101:1009:15107  147     10      26511819        255     88M     =       26501106        -10801  GATGTTCTGGACAACCTTTTCGAGAAAACTCATTGTGACTGCAA
TGTAGACTGGTGTCTTTATGAAATCTACCCGGAACTACAAATTG    :FFFFF:FFFF:FFFF,FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:177    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:CCTGAACCTCTCCAAAAAACCTCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000077420
A00267:423:HFMHMDRX3:1:2101:1009:15515  83      11      66003740        255     88M     =       66003676        -152    TGCCTTCGAGAGTGGTGCGACGCCTTCTTGTGATGCTCTCTGGG
AAGCTCTCAATCCCCAGCCCTCATCCAGAGTTTGCAGCCGAGTA    FFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        NH:i:1  HI:i:1  AS:i
:173    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GGGAGGAGTCCCAGATGAAGACCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000175334
A00267:423:HFMHMDRX3:1:2101:1009:15515  163     11      66003676        255     87M1S   =       66003740        152     CTTCCGGGAATGGCTGAAAGACACTTGTGGCGCCAACGCCAAGC
AGTCCCGGGACTGCTTCGGATGCCTTCGAGAGTGGTGCGACGCG    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFF,        NH:i:1  HI:i:1  AS:i
:173    nM:i:0  BX:Z:AGTGACCTCTCCTAGA   BC:Z:AGTGACCTCTCCTAGA   UB:Z:GGGAGGAGTCCCAGATGAAGACCT   XS:Z:Assigned3  XN:i:1  XT:Z:ENSG00000175334

Below is the output from Picard Tools CollectAlignmentSummaryMetrics run, assuming BAM files are coordinate sorted:

## htsjdk.samtools.metrics.StringHeader
# CollectAlignmentSummaryMetrics EXPECTED_PAIR_ORIENTATIONS=[] INPUT=Tunic.AGTGACCTCTCCTAGA.demx.bam OUTPUT=Tunic.AGTGACCTCTCCTAGA.demx.summary.metrics.txt    MAX_INSERT_SIZE=100000 ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Started on: Thu Nov 09 00:35:11 EST 2023

## METRICS CLASS        picard.analysis.AlignmentSummaryMetrics
CATEGORY        TOTAL_READS     PF_READS        PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED        PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES        PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES     PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE        PF_HQ_ERROR_RATE        PF_INDEL_RATE   MEAN_READ_LENGTH   READS_ALIGNED_IN_PAIRS   PCT_READS_ALIGNED_IN_PAIRS      PF_READS_IMPROPER_PAIRS PCT_PF_READS_IMPROPER_PAIRS     BAD_CYCLES      STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER SAMPLE  LIBRARY READ_GROUP
FIRST_OF_PAIR   71490794        71490794        1       62694189        0       0       0       0       0       0       0       0       0       0       88      0  0
        0       0       0       0       0       0.003427
SECOND_OF_PAIR  71475327        71475327        1       62037196        0       0       0       0       0       0       0       0       0       0       88      0  0
        0       0       0       0       0       0.000061
PAIR    142966121       142966121       1       124731385       0       0       0       0       0       0       0       0       0       0       88      0       0  0
        0       0       0       0       0.001744

There is no output for PF_READS_ALIGNED? Everything seems to be going to PF_NOISE_READS.

The same behavior happens when I try to use the <>.filtered.tagged.Aligned.out.bam

Am I missing something? I thought that the BAM files output from zUMIs were compatible with Picard tools etc.

Attached is the yaml file:

Tunic.zUMIs_config_formated.yaml.txt

Attached is the command line log file output from zUMIs:

Tunic.command_line_output_zummis.txt

Thank you for your help. Appreciate it.

Thanks, Fayaz

seifudd avatar Nov 09 '23 17:11 seifudd