htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

Added validation warning if the mapped length of the read is greater than the predicted insert size for FR pairs on the same chromosome.

Open tfenne opened this issue 7 years ago • 14 comments

Description

I'm running into situations where I'm being handed BAM files that have reads with mapped lengths longer than their insert sizes. E.g. a 100bp read is part of an FR pair, has the insert size on it recorded as 95 but has a cigar of 100M. This seems wrong to me - and this kind of situation is remedied by tools like MergeBamAlignment in Picard. I'd like to be able to point users to ValidateSamFile and have it catalog these errors too.

Since this can be somewhat pervasive in pipelines that don't trim back alignments that extend past the end of the insert, I've made it a warning, not an error.

I've also adjusted the way validation errors are handled so that even in STRICT mode, exceptions are not thrown for warnings, but only for errors.

Checklist

  • [x] Code compiles correctly
  • [x] New tests covering changes and new functionality
  • [x] All tests passing
  • [ ] Extended the README / documentation, if necessary
  • [ ] Is not backward compatible (breaks binary or source compatibility)

tfenne avatar Mar 03 '17 12:03 tfenne

@yfarjoun & @ktibbett Could I request that you guys take a look at this PR please? Particularly it would be good, prior to merging, to ensure that this change won't cause you problems internally. My expectation is that any BAM produced by MergeBamAlignment with CLIP_OVERLAPPING_READS=true (the default) should be free from these kinds of errors.

tfenne avatar Mar 03 '17 12:03 tfenne

failing tests...and needs a rebase now...

yfarjoun avatar Apr 16 '17 02:04 yfarjoun

In terms of affecting the pipeline this seems fine. would still like to have someone review the code itself. @jacarey could you please do the honors?

yfarjoun avatar May 16 '17 21:05 yfarjoun

@tfenne going over old PRs...can you rebase this? we'd like to put it in...

yfarjoun avatar May 01 '18 18:05 yfarjoun

@yfarjoun I've rebased and fixed conflicts, but am still getting test failures which I don't have time to debug right now.

tfenne avatar May 01 '18 18:05 tfenne

CramFileReader constructor with file, stream and refsource doesn't set a validation stringency and so it was null which caused processValidationErrors in SamUtils to ignore the error since previously it was it was only checking if it was if stringency == STRICT. I set them to SILENT since they have always been there just previously ignored and I'm not sure if they are relevant to the tests in any case. I could take a deeper look if necessary.

jacarey avatar May 01 '18 20:05 jacarey

Also as a note I'm a fair amount of these warning spit out on test run. We have a lot of test data that fails this check (usually adapter clipped reads as far as I can tell)

jacarey avatar May 01 '18 21:05 jacarey

@jacarey I'm not sure why adapter-clipped reads should cause this to fail...could you point me to a bam that has this problem?

yfarjoun avatar Jul 02 '18 18:07 yfarjoun

So, there seems to be a problem with index_test.bam which is used throughout the test-suite. it causes lots of validation warnings to be emitted and fills up the log file, causing the Test to fail.

Not sure how to resolve this...should we fix index_test and replace it everywhere?

@tfenne, @jacarey, @lbergelson ??

yfarjoun avatar Sep 10 '18 18:09 yfarjoun

@tfenne regarding the change around warnings, I think that it would be best to add a VALIDATION_STRINGENCY called ALLOW_WARNINGS which would be between STRICT and LENIENT.

yfarjoun avatar Sep 17 '18 18:09 yfarjoun

I retract my statement about deletions...as you are looking at the length on the reference not the CIGAR.

yfarjoun avatar Sep 17 '18 19:09 yfarjoun

is this still interesting to the OP? @tfenne

yfarjoun avatar Jan 14 '19 19:01 yfarjoun

@jacarey could you point me to the failing data?

yfarjoun avatar Feb 04 '19 19:02 yfarjoun

bump @tfenne @jacarey ....

yfarjoun avatar Sep 30 '19 18:09 yfarjoun