NanoSim
NanoSim copied to clipboard
Whether NanoSim could generate Nanopore reads by simulation
Hi @cheny19, I noticed the discription of NanoSim was a Nanopore sequence read simulator. And I wonder harness a software to generate some Nanopore reads with given a genome or a fasta file . When I looking into the scripts of NanoSim , I failed to find such a script . I really hope for your help!
Thanks.
Yes, you just need to run simulator.py
to simulate ONT reads. You can find the help info in the README.md file.
Yes, you just need to run
simulator.py
to simulate ONT reads. You can find the help info in the README.md file.
If I just run simulator.py
to simulate ONT reads without runing step one , I encountered this error :
simulator.py
genome -dna_type linear -rg 1M_12501.fa -c ssc_1M -max 90000 -min 20000 -n 1000 -t 6
Traceback (most recent call last):
File "/home/huangtao/LJQ/conda/envs/metawrap-env/bin/simulator.py", line 1513, in
And I noticed only did I run read_analysis.py
then could obtain the strandness_rate file.
So it confused me.
Right, if you want to use your own model, you have to run step1 first. However, if you don't want to train your own model, you can direct -c
to our pre-trained model (provided in the package), and run simulator.py
. You just need to untar the pre-trained model, and specify the directory and prefix to -c
option.
Right, if you want to use your own model, you have to run step1 first. However, if you don't want to train your own model, you can direct
-c
to our pre-trained model (provided in the package), and runsimulator.py
. You just need to untar the pre-trained model, and specify the directory and prefix to-c
option. I downloaded the human_NA12878_DNA_FAB49712_albacore.tar.gz you provided, then I runtar -xvzf
human_NA12878_DNA_FAB49712_albacore.tar.gz` such a error occured: gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now
It happened to me yesterday as well. You'll need to clone the whole repo, or click into the pretrained model folder from Github and then click the model you want to use to download. It seems Github has some sort of issue that the file is broken if you right click to download directly.
Thanks for your patient guideness!
I have downloaded your trianed model.
I encountered errors when I run ./NanoSim2.6.0/simulator.py genome -dna_type linear -rg 1M_12501.fa -c human_NA12878_DNA_FAB49712_albacore/training -max 90000 -min 20000 -n 1000
Traceback (most recent call last):
File "./NanoSim2.6.0/simulator.py", line 1702, in
This issue has been reported by other users in #81 , could you try from sklearn.externals import joblib
instead of import joblib
and see if it occurs?
Hi @Leejquan, We have an update on this issue. we have finally finished all the coding and testing to change the way of importing model files. We also have re-trained all the models, so we hope this problem is resolved in NanoSim v3.0.0 pre-release. Please give it a shot and let me know how it works for you. Thanks for waiting for so long.
You just need to untar the pre-trained model, and specify the directory and prefix to -c option.
Could you update the readme example to include this information? Just trying human
as written currently doesn't seem to work.
(Ideally, a path to somewhere inside the conda installation would be best, assuming they are already part of this.)
You just need to untar the pre-trained model, and specify the directory and prefix to -c option.
Could you update the readme example to include this information? Just trying
human
as written currently doesn't seem to work.(Ideally, a path to somewhere inside the conda installation would be best, assuming they are already part of this.)
Please note that -c
option in simulation stage specifies the location and prefix of error profiles generated from characterization step (Default = training). That human
thing you mentioned from README file is a symbolic name referencing the trained models on human data.
-c MODEL_PREFIX, --model_prefix MODEL_PREFIX
For more information on parameters for each mode
in training and simulation stage, you may run: read_analysis.py -h
or simulator.py -h
. There are five modes
in read_analysis.py
and three modes
in simulator.py
.
I will take a note to update the README file to make it clear.
Hell, nowadays I want to simulate some ONT reads from bacteria and virus genomes. However, I notice that your latest pre-trained models are trained on the human datasets, which may have different sequence patterns compared to bacteria ones. I am wondering, which pre-trained model should I use to get acceptable simulation results on my dataset?
Hey @zhanghaoyu9931 I would highly recommend you to train your own model and use the trained profiles to simulate reads.
The README file is very informative and it will guide you through on how to run the training pipeline. It's fast and does not require high computing power. Please refer to following code for more information:
https://github.com/bcgsc/NanoSim/blob/master/src/read_analysis.py