MIDAS
MIDAS copied to clipboard
Question regarding speed of execution
I mange applications on a research cluster and our researchers have been reporting issues with execution speed of your software on out cluster.
I have just run through the first step of the tutorial (https://github.com/snayfach/MIDAS/blob/master/docs/tutorial.md) and I wonder whether you could let me know whether the timing (see below) we are getting are exceedingly long.
I am running the code a HPC network storage and on one core of a Intel(R) Xeon(R) Gold 6240 node (I can also provide timings on the following steps in the tutorial if that would help). Moving the DB, the sample file and the output directory to local storage did not seem to affect the speed significantly.
Thanks and please see the timings for the first step of the tutorial below:
/usr/bin/time -p -v run_midas.py species midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz --remove_temp
MIDAS: Metagenomic Intra-species Diversity Analysis System
version 1.3.0; github.com/snayfach/MIDAS
Copyright (C) 2015-2016 Stephen Nayfach
Freely distributed under the GNU General Public License (GPLv3)
===========Parameters===========
Command: /u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py species midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz --remove_temp
Script: run_midas.py species
Database: /u/local/apps/midas/DB/midas_db_v1.2
Output directory: midas_output/SAMPLE_1
Input reads (unpaired): /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz
Remove temporary files: True
Word size for database search: 28
Minimum mapping alignment coverage: 0.75
Number of reads to use from input: use all
Number of threads for database search: 1
================================
Aligning reads to marker-genes database
0.66 minutes
0.75 Gb maximum memory
Classifying reads
total alignments: 2916
uniquely mapped reads: 1013
ambiguously mapped reads: 47
0.0 minutes
0.76 Gb maximum memory
Estimating species abundance
total marker-gene coverage: 10.637
0.0 minutes
0.76 Gb maximum memory
Command being timed: "run_midas.py species midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz --remove_temp"
User time (seconds): 43.04
System time (seconds): 6.53
Percent of CPU this job got: 118%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.71
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 646160
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 83162
Voluntary context switches: 18072
Involuntary context switches: 1161
Swaps: 0
File system inputs: 391032
File system outputs: 904
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I should add that I installed MIDAS using python version 3.9.6, I have just noticed that when running the second part of the tutorial (snps) there are issues with the multiprocessing package that will give this error:
TypeError: cannot pickle '_io.TextIOWrapper' object
Any idea? What version of python do you support?
FYI, the error in context is:
/usr/bin/time -p -v run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/ex
ample/sample_1.fq.gz -t 8
MIDAS: Metagenomic Intra-species Diversity Analysis System
version 1.3.0; github.com/snayfach/MIDAS
Copyright (C) 2015-2016 Stephen Nayfach
Freely distributed under the GNU General Public License (GPLv3)
===========Parameters===========
Command: /u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz -t
8
Script: run_midas.py snps
Database: /u/local/apps/midas/DB/midas_db_v1.2
Output directory: midas_output/SAMPLE_1
Remove temporary files: False
Pipeline options:
build bowtie2 database of genomes
align reads to bowtie2 genome database
use samtools to generate pileups and count variants
Database options:
include all species with >=3.0X genome coverage
Read alignment options:
input reads (unpaired): /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz
alignment speed/sensitivity: very-sensitive
alignment mode: global
number of reads to use from input: use all
number of threads for database search: 8
SNP calling options:
minimum alignment percent identity: 94.0
minimum mapping quality score: 20
minimum base quality score: 30
minimum read quality score: 20
minimum alignment coverage of reads: 0.75
trim 0 base-pairs from 3'/right end of read
================================
Reading reference data
0.0 minutes
0.1 Gb maximum memory
Building database of representative genomes
total genomes: 1
total contigs: 1
total base-pairs: 5163189
0.04 minutes
0.26 Gb maximum memory
Mapping reads to representative genomes
finished aligning
checking bamfile integrity
0.09 minutes
0.44 Gb maximum memory
Indexing bamfile
0.0 minutes
0.44 Gb maximum memory
Counting alleles
Traceback (most recent call last):
File "/u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py", line 757, in <module>
run_program(program, args)
File "/u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py", line 82, in run_program
snps.run_pipeline(args)
File "/u/local/apps/midas/1.3.2/MIDAS/midas/run/snps.py", line 301, in run_pipeline
pysam_pileup(args, species, contigs)
File "/u/local/apps/midas/1.3.2/MIDAS/midas/run/snps.py", line 228, in pysam_pileup
aln_stats = utility.parallel(species_pileup, argument_list, args['threads'])
File "/u/local/apps/midas/1.3.2/MIDAS/midas/utility.py", line 101, in parallel
return [r.get() for r in results]
File "/u/local/apps/midas/1.3.2/MIDAS/midas/utility.py", line 101, in <listcomp>
return [r.get() for r in results]
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Command exited with non-zero status 1
Command being timed: "run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz -t 8"
User time (seconds): 55.55
System time (seconds): 5.84
Percent of CPU this job got: 669%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 358876
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 218678
Voluntary context switches: 61387
Involuntary context switches: 631
Swaps: 0
File system inputs: 79176
File system outputs: 169336
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1