jvarkit icon indicating copy to clipboard operation
jvarkit copied to clipboard

error overlapping PCR intervals - pcrclipreads

Open avilella opened this issue 8 years ago • 4 comments

Verify

  • jdk1.8.0_72/bin/java

Subject of the issue

Describe your issue here.

Your environment

  • version of jvarkit: master branch
  • jdk1.8.0_72/bin/java
  • the value of ${JAVA_HOME} = $HOME/java/jdk1.8.0_72
  • linux centos 6.2

Even though I don't see the overlap between the two regions printed, it seems it's adding some padding (slop) between the two. See below. Any ideas?

java.lang.IllegalStateException: Overlapping PCR intervals : chr1:740716-740765 +       . chr1:740812-740861    +       .
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.findInterval(PcrClipReads.java:69)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.findInterval(PcrClipReads.java:59)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.run(PcrClipReads.java:104)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.call(PcrClipReads.java:191)
        at com.github.lindenb.jvarkit.tools.pcr.AbstractPcrClipReads.call(AbstractPcrClipReads.java:450)
        at com.github.lindenb.jvarkit.tools.pcr.AbstractPcrClipReads.call(AbstractPcrClipReads.java:32)
        at com.github.lindenb.jvarkit.util.command.Command.instanceMainWithExceptions(Command.java:533)
        at com.github.lindenb.jvarkit.util.command.Command.instanceMain(Command.java:570)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.main(PcrClipReads.java:206)
[main] ERROR jvarkit - Overlapping PCR intervals : chr1:740716-740765   +       . chr1:740812-740861    +       .
[main] ERROR jvarkit - Command failed 

avilella avatar Feb 25 '16 11:02 avilella

In fact, it means that a SAMrecord (chrom:start-end) overlaps two regions in your bed > the program doesn't know which bed record, it should use. I don't use this program much, so if you feel that the reads overlapping two regions should be discarded or should use a random interval, just tell me.

P.

lindenb avatar Feb 25 '16 12:02 lindenb

I think adding two extra options would cover most of the possibilities:

--5primemost and/or --3primemost

  • Clip from the bed region that is 5'prime most (left-most) side of the read if more than one is overlapping.
  • Clip from the bed region that is 5'prime most (left-most) side of the read if more than one is overlapping.

If both flags are given, clip from both sides.

Then another option like so:

--strandmost same or --strandmost opposite

  • Clip from the bed region that is most upstream of the orientation of the read with respect to the orientation of the bed. E.g.:
Examples:

 region 1                 region 2
|----------------->|     |-------------->|

              <AAAAA=====BBBBBB| read

--strandmost opposite

AAAAA would be trimmed.

########################################

 region 1                 region 2
|----------------->|     |<--------------|

              <AAAAA=====BBBBBB| read

--strandmost opposite

AAAAA would be trimmed.

########################################

 region 1                 region 2
|<-----------------|     |<--------------|

              <AAAAA=====BBBBBB| read

--strandmost same

AAAAA would be trimmed.

########################################


 region 1                 region 2
|<-----------------|     |-------------->|

              <AAAAA=====BBBBBB| read

--strandmost same

AAAAA would be trimmed.

########################################

avilella avatar Feb 25 '16 13:02 avilella

5primemost 3primemost is 'difficult' to implement if a bed region overlap a whole read. As a temporary(?) solution I created the option '-largest' which choose the BED record having the largest overlap with the current read. I don't want to look at the BED strand for now. https://github.com/lindenb/jvarkit/commit/cf1bf46196fda4e95ded4d3bbfcf71df00164ebc

lindenb avatar Feb 25 '16 13:02 lindenb

One could limit it to input bed file of non-overlapping regions, which would make the 5primemost and 3primemost possibly easier to implement. Maybe not...

Thanks a lot for all the work, Pierre!

On Thu, Feb 25, 2016 at 1:53 PM, Pierre Lindenbaum <[email protected]

wrote:

5primemost 3primemost is 'difficult' to implement if a bed region overlap a whole read. As a temporary(?) solution I created the option '-largest' which choose the BED record having the largest overlap with the current read. I don't want to look at the BED strand for now. cf1bf46 https://github.com/lindenb/jvarkit/commit/cf1bf46196fda4e95ded4d3bbfcf71df00164ebc

— Reply to this email directly or view it on GitHub https://github.com/lindenb/jvarkit/issues/44#issuecomment-188791862.

avilella avatar Feb 25 '16 13:02 avilella