STRling icon indicating copy to clipboard operation
STRling copied to clipboard

Assertion error for reads without sequence

Open christopher-schroeder opened this issue 3 years ago • 5 comments

I get the following assertion error:

strling version: 0.5.0
[strling] using existing file resources/genome.dna.homo_sapiens.GRCh38.100.fasta.str for genome repeats
[strling] got STR repeats from genome into an interval tree
[strling] collecting str-like reads
[strling] extracting chromosome:1
[strling] extracting chromosome:10
[strling] extracting chromosome:11
[strling] extracting chromosome:12
[strling] extracting chromosome:13
[strling] extracting chromosome:14
[strling] extracting chromosome:15
[strling] extracting chromosome:16
[strling] extracting chromosome:17
[strling] extracting chromosome:18
[strling] extracting chromosome:19
[strling] extracting chromosome:2
[strling] extracting chromosome:20
[strling] extracting chromosome:21
[strling] extracting chromosome:22
[strling] extracting chromosome:3
[strling] extracting chromosome:4
[strling] extracting chromosome:5
[strling] extracting chromosome:6
[strling] extracting chromosome:7
[strling] extracting chromosome:8
[strling] extracting chromosome:9
[strling] extracting chromosome:X
[strling] extracting chromosome:Y
/opt/conda/conda-bld/strling_1622157642620/work/src/strling.nim(44) strling
/opt/conda/conda-bld/strling_1622157642620/work/src/strling.nim(41) main
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(319) extract_main
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(200) add
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(67) to_tread
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/assertions.nim(30) failedAssertImpl
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/assertions.nim(23) raiseAssert
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: /opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(67, 12) `align_length > 0` K00276:107:HHYWGBBXX:8:1125:32309:38451   141     *       0       0       *
       *       0       0       *       *       AS:i:0  XS:i:0  RG:Z:LUEB0077G [AssertionDefect]

This is probably due to the * in the sequence and quality field. By specification these are allowed, for example when the sequence is fully trimmed by the adapter trimming step. Even if the read itself is useless (also because it is unmapped), it is still useful to have them in the alignment file to remain a complete paired end file.

christopher-schroeder avatar Jan 07 '22 21:01 christopher-schroeder

I have the same problem :-/

christopher-schroeder avatar May 06 '22 12:05 christopher-schroeder

Does this work on your data? https://github.com/quinlan-lab/STRling/tree/zero-len

hdashnow avatar May 06 '22 22:05 hdashnow

Yes

christopher-schroeder avatar May 07 '22 07:05 christopher-schroeder

@brentp mentioned memory concerns. Any chance you could check the memory usage with the two different versions (your PR, vs. the fix above)?

hdashnow avatar May 07 '22 21:05 hdashnow

I am a bit busy at the moment and to be honest I dont see the point in this test. There are only a couple of read pairs for this within a few millions. I dont think I could detect any memory leak. But even if, there are 2 possible outcomes: either I detect something, then you have to look at the code. Or I can't detect anything, in that you should still check the code for something fishy!

christopher-schroeder avatar May 12 '22 11:05 christopher-schroeder