medaka
medaka copied to clipboard
Medaka variant error: "IndexError: index 0 is out of bounds for axis 0 with size 0"
Describe the bug A clear and concise description of what the bug is including the command that you have run.
medaka variant
crashes with exit status 20 when the command medaka variant MPXV.reference.fasta MPVX_untreated_LSK114_280722-fast_barcode74.1.hdf MPVX_untreated_LSK114_280722-fast_barcode74.1.vcf
is run.
Logging
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/artic-test/bin/medaka", line 11, in <module>
sys.exit(main())
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/medaka.py", line 720, in main
args.func(args)
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/variant.py", line 230, in variants_from_hdf
for sample in joined_samples:
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/variant.py", line 104, in join_samples
yield medaka.common.Sample.from_samples(queue + [to_yield])
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/common.py", line 143, in from_samples
rel = Sample.relative_position(s1, s2)
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/common.py", line 224, in relative_position
s1_ord, s2_ord = sorted((s1, s2), key=lambda x: (x.first_pos, -x.size))
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/common.py", line 224, in <lambda>
s1_ord, s2_ord = sorted((s1, s2), key=lambda x: (x.first_pos, -x.size))
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/common.py", line 89, in first_pos
return self._get_pos(0)
File "/home/ubuntu/miniconda3/envs/artic-test/lib/python3.8/site-packages/medaka/common.py", line 69, in _get_pos
return p['major'][index], p['minor'][index]
IndexError: index 0 is out of bounds for axis 0 with size 0
Environment (if you do not have a GPU, write No GPU):
- Installation method: conda (mamba solver)
- OS: Ubuntu 20.04 LTS
- medaka version: 1.6.1
- GPU model: Nvidia A100
- Nvidia driver version: 510.73.05
- CUDA version: 11.6
- cuDNN version: 8.4.1.50
Additional context
This problem only seems to have appeared after changing MPX reference from AY753185.1
to MT903344.1
which included translating the primer bed file to have co-ordinates relative to the new reference (potentially affecting primer trimming). This issue only occurs for particular barcodes on particular runs. I am happy to share data privately to help locate the issue but cannot share the data publicly.
Hi @BioWilko, fancy seeing you here! I believe I have narrowed down the scope of this issue to a slicing operation in join_samples
. I have crafted a dummy HDF5 file that can trigger the same IndexError
you have provided. It would be good to understand how this could happen on a real dataset so I have sent a request to the e-mail address on your Github profile which should contain instructions to provide your HDF to us.
Small world eh @SamStudio8? Predictably the error refused to manifest when I tried to replicate it but I managed it in the end and have sent over the file, thanks!
Your data triggered an interesting edge case wherein the Sample chunk generator was emitting a chunk containing only a long insertion. Compared against the reference the chunk had only one "major" non-variant position at index 0. When the Sample was bisected on index 0, an empty Sample slice was created (with no indexable positions) and you know the rest.
I have patched Medaka to explicitly avoid this case and released v1.7.1 (which you can get from the releases page, or via bioconda). Please advise if this has fixed your issue.
Sorry I took so long to get back to you but I have finally tested this and confirm it has fixed my issue, thanks!