IndexError: list index out of range
I am running STRaglr 1.5.0 (the current release) and get the following error on multiple CRAMs:
python straglr.py NA19676.hg38.cram ../1KG_ONT_VIENNA_hg38.fa NA19676.hg38_straglr_patho --loci ../20240328_straglr_catalog.bed
Traceback (most recent call last):
File "/vast/scratch/users/fearnley.l/1KG_ONT_VIENNA/straglr/straglr.py", line 101, in <module>
main()
File "/vast/scratch/users/fearnley.l/1KG_ONT_VIENNA/straglr/straglr.py", line 98, in main
tre_finder.output_vcf(variants, '{}.vcf'.format(args.out_prefix))
File "/vast/scratch/users/fearnley.l/1KG_ONT_VIENNA/straglr/src/tre.py", line 1513, in output_vcf
fails = Variant.find_fails(variants)
File "/vast/scratch/users/fearnley.l/1KG_ONT_VIENNA/straglr/src/variant.py", line 244, in find_fails
failed_reason = Counter(failed_reasons).most_common(1)[0][0]
IndexError: list index out of range
Any suggestions as to what might cause this?
The error is caused by the lack of coverage at a given locus. An example case is when there are only 2 support reads for a given locus and each has a different repeat size. And if the min_support is set at 2, no allele can be formulated with minimum support.
The new version that produces a VCF output tries to associate a FILTER each failed locus. As I wasn't able to anticipate such scenario, I did not generate a failed reason for such scenario and therefor the script crashed.
I have made a fix that would produce a CLUSTERING_FAILED filter for such scenario and will release it shortly.
In the meantime, if you want to get past this, you could set --min_cluster_size 1 and the program should be able to finish.
Thanks very much for reporting this bug.
Hi, I have been having the same issue. I tried changing --min_cluster_size 1, but it did not fix the error for me. Do you know another problem that could be the cause. I ran:
straglr.py map-sminimap2-HG002_hg38_chr21.bam .../chr21_test_data/chr21.fa output_straglr --loci HG002_repeats_straglr.bed --min_cluster_size 1
Traceback (most recent call last):
File "/usr/local/bin/straglr.py", line 101, in <module>
main()
File "/usr/local/bin/straglr.py", line 93, in main
variants = tre_finder.genotype(args.loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1426, in genotype
return self.collect_alleles(loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1402, in collect_alleles
tre_variants = self.get_alleles(loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1252, in get_alleles
self.update_refs(variants, genome_fasta)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1271, in update_refs
refs = self.extract_refs_trf(trf_input)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 607, in extract_refs_trf
data_motif = cols[3]
IndexError: list index out of range
This is a different problem. Looks like there is something wrong when the script parsed the results from the TRF run.
Can you try running with --tmpdir <path> --debug, where <path> can be set to your output directory. This way the temporary files will be kept. I want to see if there is anything wrong with the latest ***.dat (TRF output) created.
You can first check the TRF output is there. If you only have a few loci, maybe you can post the content of the .dat file? Or you can attach the file for me to examine.
Best if you can start a new issue for this.