TEMP icon indicating copy to clipboard operation
TEMP copied to clipboard

Invalid record in bam.unproper.uniq.interval.bed

Open rimjhimroy opened this issue 7 years ago • 3 comments

Hi,

I get this error when running "TEMP_Absence.sh":

bedtools intersect -a /cluster/project/gdc/people/crimjhim/TEPID_final.bed.sort -b merged.Ma99.bam.unproper.uniq.interval.bed -f 1.0 -wo
Error: Invalid record in file merged.Ma99.bam.unproper.uniq.interval.bed. Record is
chr1    2287009 2287008 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784/2

Any idea why the coordinates are inverted in the bed file, and how should I fix this? I am working with pair-end Illumina-seq and the average insert size is 250 bp.

Thanks, Rimjhim

rimjhimroy avatar Jul 08 '17 12:07 rimjhimroy

Hi Rimjhim,

It's hard to know exactly what happened without knowing anything about "merged.Ma99.bam". Would you mind posting a few entries (including read "HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784/2") from the BAM file?

Jiali

JialiUMassWengLab avatar Jul 24 '17 05:07 JialiUMassWengLab

Hi Jiali,

Thank you very much for your reply, and I am sorry I should have added more details.

Here is a snippet from the merged.Ma99.bam from chr1:2286800-2287050

  1 HWI-700523F:21:C6KJ9ANXX:4:1203:13109:25753     163     chr1    2286830 22      77S49M  =       2286868 203     TTTGAAGCAAACAGATATGTCACCGAAAGGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATC    AATTTATTAACCAAGAAAGAGATCTGAATCGTAACATG  BBBBBGGGFD>FEBDGDGGGGFGG/<>CDGGGGGGGGEBFCBGGGGGGGGEDGGGGGGGGGGGGGGGGGEB@GGGGGGGGGGGGEFFGGFGGGGGGGGG@FGGGBFGGGGFGGGGGGCBFEGGGGF  AS:i:56 XN:i:0  XM:i:6  XO:i:0      XG:i:0  NM:i:6  MD:Z:16C2C1T14A0G6C4    YS:i:104        YT:Z:CP
  2 HWI-700523F:21:C6KJ9ANXX:4:2205:5836:73918      145     chr1    2286830 36      45S53M28S       =       2286577 -351    TATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATC    TGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTT  5DFGGGGGGG=40GGGGC@GEGGGDGGBGGGGGGGGGGBFGGF@FGGFF@GGGGD@GGFGGGGGFGCEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGA<ABB  AS:i:64 XN:i:0  XM:i:6      XO:i:0  XG:i:0  NM:i:6  MD:Z:16C2C1T14A0G6C8    YS:i:252        YT:Z:DP
  3 HWI-D00418:56:C6KLUANXX:8:2102:8126:85711       83      chr1    2286830 22      37S64M1I23M1S   =       2286699 -255    GGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGT    AACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAA  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB  AS:i:103        XN:i:0      XM:i:9  XO:i:1  XG:i:1  NM:i:10 MD:Z:16C2C1T14A0G6C8A1G23C7     YS:i:68 YT:Z:CP
  4 HWI-D00418:56:C6KLUANXX:8:2107:16542:25469      145     chr1    2286832 41      62M1I42M21S     =       2286623 -313    ACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTA    GATTCAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTA  FBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB  AS:i:123        XN:i:0      XM:i:11 XO:i:1  XG:i:1  NM:i:12 MD:Z:14C2C1T14A0G6C8A1G23C7G14A3        YS:i:220        YT:Z:DP
  5 HWI-700523F:21:C6KJ9ANXX:4:2205:11059:54702     69      chr1    2286852 0       *       =       2286852 0       TTTGTAAGATGATCAAAAACAGGAATATCTGAGAAGCTTGTAAACATATGAACAGTGAACTTTGAAGCAAACAGATATGTCACCAAAA    GGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACAC  CCCCCGGFGGGGGGEGGGFGGGGGGGGGGGG1><DGCEGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBFG@GGGGCGGGGEGGGGGGGGGGGGGGGGGGGFGG0  YT:Z:UP
  6 HWI-D00418:56:C6KLUANXX:8:1101:13471:25839      161     chr1    2286852 36      7S42M1I50M1D8M18S       =       2287239 509     TATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTC    AACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGT  BBBBBFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFF<  AS:i:121            XN:i:0  XM:i:9  XO:i:2  XG:i:2  NM:i:11 MD:Z:14A0G6C8A1G23C7G14A3G7^T8  YS:i:205        YT:Z:DP
  7 HWI-700523F:21:C6KJ9ANXX:4:2205:11059:54702     153     chr1    2286852 24      8S42M1I50M1D8M17S       =       2286852 0       TTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATT    CAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGG  @GGGCFEGGGCGGGGGGGGGGEC@FDGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGBGGGGGGGGGGFGBBBBB  AS:i:121            XN:i:0  XM:i:9  XO:i:2  XG:i:2  NM:i:11 MD:Z:14A0G6C8A1G23C7G14A3G7^T8  YT:Z:UP
  8 HWI-700523F:21:C6KJ9ANXX:4:1203:13109:25753     83      chr1    2286868 22      1S26M1I50M1D8M40S       =       2286830 -203    AATCGTAACATGAAAGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAAC    CCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAAC  =GD.F@@C0B;>F<000F>FGCGGDB0C>D@FCGFFDB:0DCGGGFCGEGGGGGGGFF>GGGE<=11<DF>F>GF1@BC1CGGGGGGGGGGGCGGGF@CGEF>E@>GGGGGGGGGGGCGGGBBBBA  AS:i:104            XN:i:0  XM:i:8  XO:i:2  XG:i:2  NM:i:10 MD:Z:6C6T1A1G23C7G14A3G7^T8     YS:i:56 YT:Z:CP
  9 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784     161     chr1    2286876 36      18M1I50M1D7M2D3M1D1M3D1M5D17M2D28M      =       2287003 252     ATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAA    GTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAA  BBA=?FGG>GD@F=BDFFEGG1CGGGGFGCGCEC1FGGG>1EGGGGGGGGGGGDFGGGGGGGCBFGGDD0FFC00FGGCFGGGGDGD000=FFG@0:0FB@007CF@>@FC@F?CFG>:F4BA@C=      AS:i:99 XN:i:0  XM:i:11 XO:i:7  XG:i:15 NM:i:26 MD:Z:7A1G23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A23      YS:i:201        YT:Z:DP
 10 HWI-700523F:21:C6KJ9ANXX:4:2211:13774:52164     97      chr1    2286886 28      2S8M1I50M1D7M2D3M1D1M3D1M5D17M2D36M     =       2287159 393     ACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACC    CTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAATATCCTCA  @BBCC@D=EB>BCDF;ED:11EGGDG>DD1:<FEFB1EDGFGBGGCB1FFGG@FFGGDFGFEGDBDGGG@EFB1FEGG1BC>D0FG00FGGCDE0;F0E@G>CFGG0<F>@FGCD@C0CFFFGGG8      AS:i:99 XN:i:0  XM:i:10 XO:i:7  XG:i:15 NM:i:25 MD:Z:23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A27A3        YS:i:194        YT:Z:DP
 11 HWI-D00418:56:C6KLUANXX:8:2109:9866:47127       69      chr1    2286992 0       *       =       2286992 0       ATGATCAAAAACAGGAATATCTGAGAAGCTTGTAAACATATGAACAGTGAACTTTGAAGCAAACAGATATGTCACCAAAAGGGCTATT    AAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTC  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  YT:Z:UP
 12 HWI-D00418:56:C6KLUANXX:8:2109:9866:47127       153     chr1    2286992 42      87M1I4M1I10M1D22M       =       2286992 0       AAATCAGATCTAACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAG    AAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCG   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB   AS:i:194            XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:27A60G7G4^C1A20    YT:Z:UP
 13 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784     81      chr1    2287003 36      76M1I4M1I10M1D34M       =       2286876 -252    AACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGA    TACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAAT  6:000090@0700000;0000=0800808<00C=0/=E:000=0/>E/EDC0F1@DGGGC@CF=1DE00CF@F:<GGF1:BF>G@GF1F>F11F1EGGG>CGGGGCGBGGGGGGGGGGF@CBBCB@  AS:i:201            XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:16A60G7G4^C1A32    YS:i:99 YT:Z:DP
 14 HWI-D00418:56:C6KLUANXX:8:2302:14361:100328     69      chr1    2287016 0       *       =       2287016 0       ACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATC    AAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACCC  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  YT:Z:UP
 15 HWI-D00418:56:C6KLUANXX:8:2302:14361:100328     153     chr1    2287016 37      63M1I4M1I10M1D46M       =       2287016 0       ATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAG    GATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAG   FFFFBFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB   AS:i:187            XS:i:61 XN:i:0  XM:i:5  XO:i:3  XG:i:3  NM:i:8  MD:Z:3A60G7G4^C1A40A3   YT:Z:UP
 16 HWI-700523F:21:C6KJ9ANXX:4:2206:19732:85187     133     chr1    2287024 0       *       =       2287024 0       AAACAGATATGTCACCAAAAGGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATT    AACCAAGAAAGAGATCTGAATCGTAACATGAATGCACA  ?AA@BBGGGGGG>GGGGGG>1FDGGGGG1FGGGGGGG1FGGGFGGGGGGGGGGGBGGGGBC@FGGGGGGGGGGGGGGGCGGGGGGGGGGGGEDGBGGGGGEGF0FFGCF@FFFGGGCG0CGGGEGF  YT:Z:UP
 17 HWI-700523F:21:C6KJ9ANXX:4:2206:19732:85187     89      chr1    2287024 25      55M1I4M1I10M1D55M       =       2287024 0       AGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAA    GAAACTCACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAGAGAACTCTT  GFD@9C<<008CC0.8C>FC;0DE9/F=00CGGGGDF0D@GGGFE@GEGGFGGGGGGGFEEGDFFGGGGGGGGGFEF@GGEF:>>GBGGFBGGGGGGEGGGCGGFFEF;GGDGFGFC1CGDA=CCB  AS:i:196            XS:i:80 XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:56G7G4^C1A40A12    YT:Z:UP
 18 HWI-D00418:56:C6KLUANXX:8:1308:13410:70628      97      chr1    2287031 21      48M1I4M1I10M1D62M       =       2287182 0       ATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTC    ACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAGAGAACTCTTCCAAGAA  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  AS:i:189            XS:i:93 XN:i:0  XM:i:5  XO:i:3  XG:i:3  NM:i:8  MD:Z:49G7G4^C1A40A14G4  YT:Z:UP

Where the read HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784 is:

HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784	161	chr1	2286876	36	18M1I50M1D7M2D3M1D1M3D1M5D17M2D28M	=	2287003	252	ATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAA	BBA=?FGG>GD@F=BDFFEGG1CGGGGFGCGCEC1FGGG>1EGGGGGGGGGGGDFGGGGGGGCBFGGDD0FFC00FGGCFGGGGDGD000=FFG@0:0FB@007CF@>@FC@F?CFG>:F4BA@C=	AS:i:99	XN:i:0	XM:i:11	XO:i:7	XG:i:15	NM:i:26	MD:Z:7A1G23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A23	YS:i:201	YT:Z:DP
HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784	81	chr1	2287003	36	76M1I4M1I10M1D34M	=	2286876	-252	AACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAAT	6:000090@0700000;0000=0800808<00C=0/=E:000=0/>E/EDC0F1@DGGGC@CF=1DE00CF@F:<GGF1:BF>G@GF1F>F11F1EGGG>CGGGGCGBGGGGGGGGGGF@CBBCB@	AS:i:201	XN:i:0	XM:i:4	XO:i:3	XG:i:3	NM:i:7	MD:Z:16A60G7G4^C1A32	YS:i:99	YT:Z:DP

Please let me know if you need more lines.

Thanks,

Rimjhim

rimjhimroy avatar Jul 24 '17 12:07 rimjhimroy

Rimjhim,

This is caused by having very long reads and the two reads actually overlap.

I've modified the code and it should be taken care of. Let me know if it still doesn't work.

Jiali

JialiUMassWengLab avatar Jul 31 '17 06:07 JialiUMassWengLab