NanoSim icon indicating copy to clipboard operation
NanoSim copied to clipboard

hangs at start of simulation

Open omarkr8 opened this issue 2 years ago • 5 comments

Hi,

I'm finding myself stuck at the start of simulations. Im running the metagenome sim. for example it will say : 2022-12-28 23:26:25: Read in seq1 2022-12-28 23:26:25: Read in seq2 2022-12-28 23:26:25: Read in abundance profile 2022-12-28 23:26:25: Read error profile 2022-12-28 23:26:25: Read KDF of unaligned reads 2022-12-28 23:26:25: Read KDF of aligned reads 2022-12-28 23:26:25: Read chimeric simulation information 2022-12-28 23:26:25: Simulating sample sample0 2022-12-28 23:26:25: Start simulation of aligned reads

and it just stays here, the output files are created but remain empty. the strange thing is that it has sort of worked one time in my many attempts, the same command. but the one time that data is produced and the pipeline seemed to progress, it went through sample0, sample1, but when it starts to simulate sample2, it got stuck.

Now im finding that it gets stuck even at sample0. I will keep tinkering to see if its resources on my end.

omarkr8 avatar Dec 29 '22 07:12 omarkr8

Okay so i had a run successfully complete. perhaps I was asking for too many reads? my metagenome simulations was setup initially for 4 samples. 2k,4k,8k,16k reads. and it would get stuck throughout. changing it to 500,1k,2k,4k; i got it to run complete. will do a bit more testing to see if the read numbers really did cause the hang.

omarkr8 avatar Dec 29 '22 08:12 omarkr8

So now i can consistently run nanosim. might be related to min and max length of reads im trying to simulate.

my target region is short 500bp, and nanosim will hang(seems to) if i try to make reads with little variation in length ex 490-500. the least i need is 20bp difference.

so question... if im trying to simulate perfect reads, why not make them length perfect too? basically exact copies of the references. or is there a simpler way i can have perfect/perfect reads.

omarkr8 avatar Jan 04 '23 07:01 omarkr8

OK, So, We are using a pretrained model for metagenome simulations, We have a standard read count and 2 samples are getting generated, however when we are changing the no. of species that should be included in the sample the process hangs in the middle and doesnot finish. the readcount is 100 and 1000 so it's not very large for Nanosim to run simulations, I am not understanding why is this happening.

aastha-batta avatar Feb 17 '23 11:02 aastha-batta

@omarkr8 do you happen to have any idea or solution for this?

aastha-batta avatar Feb 17 '23 11:02 aastha-batta

So now i can consistently run nanosim. might be related to min and max length of reads im trying to simulate.

Nice to hear that you were able to run it consistently now. A couple of questions before we can help you better:

Did you use the pre-trained models or train your own model? Would you please provide the exact command you used? And lastly would you confirm you used the master branch version?

my target region is short 500bp, and nanosim will hang(seems to) if i try to make reads with little variation in length ex 490-500. the least i need is 20bp difference.

Thanks for sharing this. I also suggest you to try training your own model and then using it to generate reads. With this approach, the length distribution in your trained model would be around the range you mentioned, and therefore, the length distribution of synthetic reads would be in the same range. This also helps you with the "perfect" read question you had.

It would be nice to also hear @cheny19 thoughts on this min/max issue.

so question... if im trying to simulate perfect reads, why not make them length perfect too? basically exact copies of the references. or is there a simpler way i can have perfect/perfect reads.

The --perfect option in simulatory.py allows you to generate reads without any errors introduced. The length distribution is derived from the training samples though.

SaberHQ avatar Feb 25 '23 01:02 SaberHQ