helmsman
helmsman copied to clipboard
KeyError from pyfaidx can mean that the chromosome labels in the ref fasta and vcf don't match
Hi, I got this error and I figured it out, but I thought I would post this here because it requires a close look at the scripts to figure out how to interpret the error. As a low-priority suggestion, it might be worth adding a more user-friendly error for this situation.
Here is the error I initially got:
Traceback (most recent call last):
File "/Users/teresapegan/opt/miniconda3/envs/helmsman/lib/python3.6/site-packages/pyfaidx/__init__.py", line 997, in __getitem__
return self.records[rname]
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "helmsman/helmsman.py", line 413, in <module>
main()
File "helmsman/helmsman.py", line 329, in main
data_in = util.processInput(args.mode, args, subtypes_dict)
File "/Users/teresapegan/helmsman/util.py", line 283, in __init__
self.data = self.process_vcf(args.input)
File "/Users/teresapegan/helmsman/util.py", line 400, in process_vcf
sequence = fasta_reader[row_chr]
File "/Users/teresapegan/opt/miniconda3/envs/helmsman/lib/python3.6/site-packages/pyfaidx/__init__.py", line 999, in __getitem__
raise KeyError("{0} not in {1}.".format(rname, self.filename))
KeyError: '1 not in MutSpect/Spalm_arbitrary_reference.fasta.'
It turns out this was simply because my VCF had a chromosome label of "1" and my reference fasta had a more complex chromosome label with species name in it. When I changed the reference fasta chromosome label, then helmsman worked.
Hope this is helpful, -Teresa