InSilicoSeq
InSilicoSeq copied to clipboard
Inability to Generate Perfect Reads
I am trying to generate perfect reads from a single genome .FASTA file, and whenever I do, the package attempts to generate the reads but kicks me back to the help menu. I am able to generate reads with '--mode basic' and with error models like '--model novaseq', but not with '--mode perfect'. I am using InSilicoSeq version 2.0.1. Is anybody else able to reproduce this?
Minimally reproducible example: download the attached .fasta file chromosome.zip, and run the following 2 commands:
conda create -n test insilicoseq -c bioconda
iss generate --genomes chromosome.fasta --mode perfect --n_reads 100 --compress --output chromosome_perfect_reads
Expected output: 2 files, one called chromosome_perfect_reads_R1.fastq, another called chromosome_perfect_reads_R2.fastq
Actual output:
(test) workflow$ iss generate --genomes chromosome.fasta --mode perfect --compress --n_reads 100 --output chromosome_perfect_reads
INFO:iss.app:Starting iss generate
INFO:iss.generator:Using perfect ErrorModel
INFO:iss.util:Stitching input files together
INFO:iss.generator:Using lognormal abundance distribution
INFO:iss.app:Using 2 cpus for read generation
INFO:iss.app:Generating 100 reads
usage: iss [subcommand] [options]
InSilicoSeq: A sequencing simulator
options:
-h, --help show this help message and exit
-v, --version print software version and exit
available subcommands:
model generate an error model from a bam file
generate simulate reads from an error model
Accidentally closed because I thought I resolved the issue. It turns out my placement of the '--compress' option was partially responsible; moving that flag to the end of the command, while testing, I have been able to generate reads several times, but I am still unable to reliably generate reads, and in most of the cases the program produces the same erroneous output.
The core issue appears to be that there is no 'store_mutations' attribute in the PerfectErrorModel class (InSilicoSeq/iss/error_models/perfect.py), even though that attribute is expected in the main initialization function for Error Models (InSilicoSeq/iss/error_models/basic.py). As a result, there is a nonzero chance that the generator will attempt to mutate a read, in which case it will check 'store_mutations' and throw an error - if you only generate 5 or 10 reads, this may never happen, but at 100 or more, this will almost always cause an error, though there is a slight chance it will not.
Workaround: in InSilicoSeq/iss/error_models/perfect.py, add this line on line 20:
self.store_mutations = False
Alternatively, change line 14 to:
def __init__(self, fragment_length=None, fragment_sd=None, store_mutations=False):
and add this line on line 20:
self.store_mutations = store_mutations
Just adding in that I also hit this issue when trying to simulate --mode perfect reads, in versions 2.0.0 and 2.0.1. Thank you, @Shruteek, for noting the workaround here to help others like myself!
For those looking for the error_models/perfect.py file following a conda install in order to add the fix @Shruteek shared above, while in the active conda environment it can be found here: ${CONDA_PREFIX}/lib/python3.12/site-packages/iss/error_models/perfect.py (or potentially in a different python version subdirectory in that location if 3.12 isn't pinned)