How many threads and memories required at training stage?

Open yaoxkkkkk opened this issue 1 year ago • 1 comments

Thank you for your development. I am using Nanosim to simulate ONT data, I use 32 threads and 256GB memory to run training stage, but it reported out of memory error. The command is

	read_analysis.py genome \
		-i ZJYY_ont_filter.fq.gz \
		-rg nd.asm.fasta \
		-o ${home_dir}/01-data/ONT/${species}_training \
		--fastq \
		-t 32

The ZJYY_ont_filter.fq.gz dataset stat is

file                   format  type   num_seqs         sum_len  min_len   avg_len  max_len
ZJYY_ont_filter.fq.gz  FASTQ   DNA   1,544,988  43,308,647,713    2,000  28,031.7  246,468

And when I run the command without --fastq parameter, the training step could be finished.

Oct 20 '24 07:10 yaoxkkkkk

Hi @yaoxkkkkk,

The amount of memory required will really depend on the dataset that you are training on. On my end, training using --fastq with the HG002 ONT dataset used for the latest pre-trained models required around 263 GB of RAM - so that could be why you are seeing those errors. If you want to use --fastq, some other options could be to use our pre-trained model, or try training using a subset of your reads.

Thank you for your interest in NanoSim! Lauren

Oct 21 '24 14:10 lcoombe