graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

core dumped

Open koujiaodahan opened this issue 3 years ago • 13 comments

Hi: when i run graphtyper,it broken with the error,that means the computer node's cpu cant satisfy the graphtyper runnning? if so,when i run the script with 350 samples, what is the necessary computing resource for runnning the software? hoping your reply,thank you!

the error:: vimmer_graphtyper.sh: line 6: 9481 Aborted (core dumped) /zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/pythonenv/python3_pakages/bin/graphtyper genotype_sv /hwfssz1/BIGDATA_COMPUTING/GaeaProject/reference/hg38_noalt_withrandom/hg38.fa $outsvimmer --sams=/zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/testdata/332_cram_bam.path --region_file=/zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/software/svimmer/region_file --output=$outdirgraphtyper

koujiaodahan avatar Aug 15 '20 06:08 koujiaodahan

Hello, I don't think it should need that much, less than 4GB per thread should be enough for 350 samples (assuming 30x coverage).

For me I typically get "Killed" message if it failed on memory though. Could you rerun with --verbose (or even --vverbose) to get a better idea where in the process it is failing.

Best, Hannes

hannespetur avatar Aug 17 '20 10:08 hannespetur

r###

Hello, I don't think it should need that much, less than 4GB per thread should be enough for 350 samples (assuming 30x coverage).

For me I typically get "Killed" message if it failed on memory though. Could you rerun with --verbose (or even --vverbose) to get a better idea where in the process it is failing.

Best, Hannes

Hello, I don't think it should need that much, less than 4GB per thread should be enough for 350 samples (assuming 30x coverage).

For me I typically get "Killed" message if it failed on memory though. Could you rerun with --verbose (or even --vverbose) to get a better idea where in the process it is failing.

Best, Hannes

thanks,I am running with 32G memory per thread.do you suggest that i run it by chromosomes to speed up progress?and i have get result as follows,there are many subfiles,should i merge all the vcfs as the final result?: image

koujiaodahan avatar Aug 27 '20 14:08 koujiaodahan

and the running has passed one week, the running speed is normal? if not, how long would you think for 300 (30x )samples?

koujiaodahan avatar Aug 27 '20 14:08 koujiaodahan

Yes, you can combine the output files with e.g. bcftools concat -n. Try to split the process by chromosomes or even smaller regions like 1 MB or 10 MB. Graphtyper should need about 50 CPU hours per 30x human sample, although there have been a few report of graphtyper being very slow in some rare occasions/intervals.

Best, Hannes

hannespetur avatar Sep 01 '20 10:09 hannespetur

I've also been having memory issues trying to joint genotype the Phase3 1000 Genomes WGS data (2504 samples at 30X). On nodes with 190 Gb memory and parallelizing by chromosome, about 2/3 of the chromosomes ran out of memory and got killed (using 64 CPUs, 3 Gb/CPU). A test on chrY with 756 Gb memory also crashed (64 CPUs, 12 Gb/CPU). Stderr running with -v shows the last step before getting killed is <info> Calculating contig offsets. It was not well correlated with chromosome size, but rather it seemed random—some large chromosomes worked and some small chromosomes failed. I moved to genotyping the 5 major populations (~500 samples each), and then to the 26 subpopulations (~100 samples each), but still the 190 Gb nodes ran out of memory. At just 100 samples I started at 64 CPUs (3 Gb/CPU) and moved progressively down to just 4 CPUs (48 Gb/CPU) and still the same chromosomes eventually failed on memory. I also reduced --max_files_open from default 1000 down to 128 but this didn't help either. The paper describes joint genotyping 50,000 Icelanders-- some specific guidance on how to scale graphtyper to larger cohorts and the computational resources required (CPUs and memory) would be appreciated-- how did you do the 50K sample cohort?

seboyden avatar Sep 02 '20 00:09 seboyden

Hey @seboyden ,

I would recommend using 1MB intervals for that many samples - running whole chromosomes at once is definitely too much work. There are a few 1MB intervals that have ultra high read coverage (10,000x+ coverage in a 30x sample) which will require extremely high memory and time with graphtyper v2.5.1. I am trying to solve that problem by adding the option --avg_cov_by_readlen=FILE in genotype_sv - You can give it a try using this development graphtyper version: https://drive.google.com/file/d/1_Yw4i_dolxvf_lKZrADjJ2R01VHSSsg9/view?usp=sharing

The FILE in --avg_cov_by_readlen=FILE which should contain the average coverage divided by read length for each BAM/CRAM (one value per line). The list provided in --sams=FILE should be in the same order as the avg_cov_by_readlen file. So for example if have a sample with 30x coverage with 100bp reads you should put the value "0.3". You may calculate the values for your BAMLIST using:

parallel -k "samtools idxstats {1} | head -n -1 | awk '{sum+=\$3+\$4; ref+=\$2} END{print sum/ref}'" :::: ${BAMLIST} > ${BAMLIST}.avg_cov_by_readlen

For our 50K sample run I split the the genome in 1 MB intervals and ran each job with 24 threads. I also split my bamlist in 10 pools (5K bams in each pool) and merged with graphtyper merge afterwards, but you shouldn't need to do that with 2.5K samples. The commands looked something like this:

${GRAPHTYPER} genotype_sv ${REFERENCE} ${SVIMMER_OUT}.vcf.gz --region=chr1:1-1000000 --sams=${BAMLIST} --avg_cov_by_readlen=${BAMLIST}.avg_cov_by_readlen --threads=24 --verbose
${GRAPHTYPER} genotype_sv ${REFERENCE} ${SVIMMER_OUT}.vcf.gz --region=chr1:1000001-2000000 --sams=${BAMLIST} --avg_cov_by_readlen=${BAMLIST}.avg_cov_by_readlen --threads=24 --verbose

We did have the coverage filter in the graphtyper version we used in the paper but it was later removed because we saw it resulted in poorer genotyping for many SVs. Now I am basically just introducing a modified version of it which is much less strict than before so it shouldn't effect genotyping quality nearly as much. I am planning to make a release with the new option soon but I want to test it more first and I'd appreciate any feedback.

Best, Hannes

hannespetur avatar Sep 02 '20 15:09 hannespetur

Thanks Hannes-- I tried v2.6.0-dev with --avg_cov_by_readlen as you described, but I still have some intervals failing on memory (2504 samples with 30X WGS, 190 Gb nodes, 64 CPUs, 3 Gb/CPU). I'm not sure what the target Gb/CPU should be for this use case with the new modified downsampling, but before I do more testing reducing the CPUs etc., can you tell me in what version was the original downsampling filter removed? It's not clear from the release notes. I would like to test with the last version that still had the original filter to verify this is the problem.

Also note in your command line for generating the .avg_cov_by_readlen file, the head -n -1 command takes the coverage from chr1, which should be a good proxy for the autosomes, but not necessarily for chrX and Y in a male, and definitely not for chrM if you want to call SVs there. If parallelizing by chromosome anyway, you could consider replacing head -n -1 with grep -w ${CHR} to make a separate avg_cov file for each chromosome.

seboyden avatar Sep 10 '20 18:09 seboyden

Thanks Hannes-- I tried v2.6.0-dev with --avg_cov_by_readlen as you described, but I still have some intervals failing on memory (2504 samples with 30X WGS, 190 Gb nodes, 64 CPUs, 3 Gb/CPU).

Okay, maybe also try reducing the number of open files like you did before, i.e. --max_files_open=64 (equal to the number of CPUs you are running with).

I'm not sure what the target Gb/CPU should be for this use case with the new modified downsampling, but before I do more testing reducing the CPUs etc., can you tell me in what version was the original downsampling filter removed? It's not clear from the release notes. I would like to test with the last version that still had the original filter to verify this is the problem.

v2.0, which is the version we ran for the paper. Back then there was no "genotype_sv" command (introduced in v2.1), instead you had the run the sv pipeline in https://github.com/DecodeGenetics/graphtyper-pipelines which is now deprecated. I'd recommend trying --max_files_open=64 before going that road. The downsampling was handled by bamShrink in that old pipeline and I didn't port it from bamShrink into graphtyper in v2.1 because I thought it wasn't needed and I saw that graphtyper's sensitivity increased after removing it.

Also note in your command line for generating the .avg_cov_by_readlen file, the head -n -1 command takes the coverage from chr1

head -n -1 removes the last line, which are reads on the unmapped contig ("*") in the BAM. There is a sneaky dash there.

If parallelizing by chromosome anyway, you could consider replacing head -n -1 with grep -w ${CHR} to make a separate avg_cov file for each chromosome.

Sure you could do that.

Best, Hannes

hannespetur avatar Sep 11 '20 09:09 hannespetur

I finally got this to work (2504 samples with 30X WGS, 190 Gb nodes using v2.6.0-dev with --avg_cov_by_readlen) by using --max_files_open=128 and --threads=64. Whether or not it failed on memory seemed to correlate with the "Thread work" reported in stderr. Generally, the average thread work equaled (# samples) / (max_files_open), regardless of number of threads, e.g. it was ~20 for 2504 samples and 128 open files (2504/128), regardless of whether I used 16, 32, or 64 threads. However, this formula was no longer true when CPUs = open files, i.e. when I went down to 64 open files (as suggested for 64 CPUs), thread work went to 1 instead of to 40. The same was true for 32 CPUs and 32 open files, and for 16 CPUs and 16 open files. In any case, the run would complete if reported thread work was >1 (tested from 5 to 80), but would fail on memory when thread work was =1, which happened when --max_files_open was either too large (> # samples) or too small (= # threads). Setting --max_files_open equal to 2-4X the number of threads seemed to be the safest combination.

seboyden avatar Oct 08 '20 23:10 seboyden

What is the trade off in running graphtyper on 1000s of crams versus running it on each cram individually and then merging? Is the overall CPU resource utilization less?

jjfarrell avatar Oct 17 '20 17:10 jjfarrell

It should output the same results but would be (slightly) slower overall as it needs to do a little more work (more VCF parses, more I/O operations, more graph constructions and indexing, etc). I'm not sure if CPU resource utilization is affected as well.

hannespetur avatar Nov 13 '20 13:11 hannespetur

@hannespetur I tried the developmental version on the GIAB trio and 4789 crams with the avg_cov_by_readlen parameter using a 32 core 196 compute node. For the run, The script uses --threads 32 --max_files_open=32 with 1mb intervals. It is running much faster and addressed issue #45. How close is the new development version to a release?

jjfarrell avatar Nov 22 '20 15:11 jjfarrell

The --avg_cov_by_readlen option to subsample reads has been added to graphtyper v2.6.1.

hannespetur avatar Jan 08 '21 09:01 hannespetur