NanoSim
NanoSim copied to clipboard
Index out of range errors
I'm getting some index out of range errors, possibly because of setting the same value (or too close?) for -min
and -max
:
-min 10000 -max 10000
:
2022-04-21 13:17:35: Start simulation of aligned reads
Process Process-1:
Traceback (most recent call last):
File "/home/philae/.local/share/miniconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/philae/.local/share/miniconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/philae/.local/share/miniconda3/bin/simulator.py", line 1293, in simulation_aligned_genome
remainder = int(remainder_lengths[each_read])
IndexError: list index out of range
and
-min 900000 -max 1100000
:
2022-04-21 13:19:34: Start simulation of aligned reads
Process Process-1:
Traceback (most recent call last):
File "/home/philae/.local/share/miniconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/philae/.local/share/miniconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/philae/.local/share/miniconda3/bin/simulator.py", line 1294, in simulation_aligned_genome
head_vs_ht_ratio = head_vs_ht_ratio_list[each_read]
IndexError: list index out of range
With the first case, obviously it is not logical to set min
and max
length equal to each other. With your second case scenario, I suspect that the reference genome you are using is smaller than the read lengths you specified. May I ask whether you are using the pre-trained models or if you trained your own model?
With the first case, obviously it is not logical to set
min
andmax
length equal to each other.
Hmm OK, that wasn't obvious to me. I would like to generate some reads to test a pairwise aligner I'm working on, and to benchmark it, it is nice to have reads of a specific length. I changed it some some interval around it and it works now. Anyway, displaying a warning of just crashing would be nice ;)
With your second case scenario, I suspect that the reference genome you are using is smaller than the read lengths you specified.
Oh right, that may well be the case. I am using some human genome reference but I noticed my fasta file also has some shorter sequences in addition to the long chromosomes. Again, a warning message would be nice.
May I ask whether you are using the pre-trained models or if you trained your own model?
I'm using pre-trained models, since I don't have direct access to reads.
My full NanoSim invocation is this, where {..}
will be substituted by snakemake:
simulator.py genome \
--ref_g input/reference/human.fa \
--output input/simulated/human-x{wildcards.x}-n{wildcards.n} \
-dna_type linear \
--model_prefix ../../nanosim/pre-trained_models/human_NA12878_DNA_FAB49712_guppy/training \
--min_len {params.min} \
--median_len {wildcards.n} \
--max_len {params.max} \
--sd_len 1.05 \
--number {params.generate_x} \
--strandness 1 \
--seed 314151 \
--num_threads 6