STRling icon indicating copy to clipboard operation
STRling copied to clipboard

STRling warning message and empty binaries

Open hdashnow opened this issue 2 years ago • 0 comments

Abbreviated from emailed bug report:

" I'm using STRling with CRAM files from the Human Genome Diversity Project (HGDP-ceph) downloaded from there: https://www.internationalgenome.org/data-portal/population/FrenchHGDP. I indexed the files using samtools index and proceded to running STRling extract with the

GRCh38_full_analysis_set_plus_decoy_hla.fa reference genome (the one the reads were aligned to). The program is running smoothly until after the Y chromosome where it starts giving me warnings like :

warning. bad read (this happens with bwa-kit alignments):ERR1395768.33889710 already in table as:(tid: 204, position: 52844, repeat: ['\x00', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ1, split: none, mapping_quality: 0, repeat_count: 0, align_length: 151, qname: "ERR1395768.33889710")

The program will keep going until completion with no crash, but the resulting .bin files are signicantly smaller compared to those I get when using our own files (1.2 MB vs 30 MB). The call step using those binaries yield empty outputs with no STRs found and a few lines in the Unplaced.txt file.

strling extract -f /path/to/decoy_hla/GRCh38_full_analysis_set_plus_decoy_hla.fa -v /path/to/HGDP00511.alt_bwamem_GRCh38DH.20181023.French.cram str-control/511.bin

strling call --output-prefix indiv/511 -f /path/to/decoy_hla/GRCh38_full_analysis_set_plus_decoy_hla.fa /pat/to/controls/HGDP00511.alt_bwamem_GRCh38DH.20181023.French.cram 511.bin "

hdashnow avatar Jun 29 '22 21:06 hdashnow