pgsc_calc icon indicating copy to clipboard operation
pgsc_calc copied to clipboard

Out of memory error in MAKE_COMPATIBLE:PLINK2_VCF step when processing Illumina WGS gVCF-derived VCF

Open Janeyre91 opened this issue 1 month ago • 3 comments

Description of the bug

I'm running pgsc_calc on a Illumina WGS (on a server with 1TB of RAM). The WGS was produced using Haplotype caller (gVCF mode) followed by GenotypeGVCFs: java -jar /home/tools/gatk-4.6.2.0/gatk-package-4.6.2.0-local.jar GenotypeGVCFs -R Homo_sapiens_assembly38_noalt.fasta -V X2121.snps.raw.g.vcf.gz -O X2121.raw.vcf.gz --dbsnp dbSNP155_fixed.vcf.gz --include-non-variant-sites true

The command that I've used for pgsc_calc is: nextflow run pgscatalog/pgsc_calc \ -profile docker \ --input samplesheet.csv \ --pgs_id PGS001931 \ --efo_id EFO_0009695 \ --target_build GRCh38 \ --liftover \ --hg19_chain https://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToHg38.over.chain.gz \ --hg38_chain https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz \ --run-ancestry https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst \ -c custom.config My custom.config is: `process { executor = 'local'

withName: 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF' {
    memory = '300.GB'
    cpus = 16
    time = '48.h'
}

}` I'm obtaining this error:

`ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (X2121 chromosome ALL)'

Caused by: Process terminated with an error exit status (2)

Command executed:

plink2
--threads 16
--memory 307200
--set-all-var-ids '@:#:$r:$a'
--max-alleles 2
--freq
--missing vcols=fmissdosage,fmiss
--new-id-max-allele-len 100 missing
--vcf X2121.raw.vcf
--allow-extra-chr --chr 1-22, X, Y, XY
--make-pgen vzs pvar-cols="-xheader,-maybequal,-maybefilter,-maybeinfo,-maybecm"
--out GRCh38_X2121_ALL

Command exit status: 2

Output: Out of memory.`

Thanks in advance!!!

Command used and terminal output


Relevant files

No response

System information

No response

Janeyre91 avatar Oct 31 '25 15:10 Janeyre91

I'd suggest converting your VCF into plink2 files before using the workflow. Your VCF file seems complicated and big!

See here for more details. In particular it would be probably help to split your VCF so that each chromosome has one file. You can do that by running:

plink2 --vcf <full_path_to_vcf.vcf.gz> \
    --allow-extra-chr \
    --chr <chromosome> \
    -make-pgen --out <short name>_<chromosome>

for each chromosome.

nebfield avatar Nov 06 '25 09:11 nebfield

Thank for your suggestion but I continue having problems. The PLINK2_SCORE process (APPLY_SCORE) terminates with exit status 2 due to an out-of-memory error, even though sufficient system RAM is available and the job is allocated 8 GB. This occurs when running the pipeline on chromosome-split WGS PLINK2 files Dataset Characteristics

Input format: PLINK2 files (.pgen, .psam, .pvar) pre-split by chromosome

For example Sites on chr1: 248,956,435 File size (chr1): .pgen: 713 MB .pvar: 6.4 GB (uncompressed ASCII) .psam: 30 bytes (1 sample) Target build: GRCh38 PGS scores: 3 (PGS001931, PGS002148, PGS003516)

Custom Configuration File Used

// nextflow.config
docker {
    enabled = true
    runOptions = '--user root'
}

process {
    // Global retry strategy
    errorStrategy = { task.exitStatus in [130,131,134,135,137,139,140,141,143,145] ? 'retry' : 'finish' }
    maxRetries = 3

    // Configuration for MATCH_VARIANTS (heavy process)
    withName: 'PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS' {
        // Partenza da 128 GB, scala automaticamente con i retry
        memory = { 64.GB * task.attempt }
        time = { 6.h * task.attempt }
        cpus = 2
        // Impostazione limite memoria container Docker
        containerOptions = '--user root --memory=256g'
    }

    // Configuration for PLINK2_RELABELPVAR (medium process)
    withName: 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR' {
        memory = { 64.GB * task.attempt }
        time = { 4.h * task.attempt }
        cpus = 2
        containerOptions = '--user root --memory=128g'
    }

    // General PLINK2 processes (catch-all for any PLINK2 process)
    withLabel: 'process_medium' {
        memory = { 64.GB * task.attempt }
        cpus = 2
        time = { 4.h * task.attempt }
        containerOptions = '--user root'
    }
}

params {
    // Limiti globali consigliati per non saturare la RAM totale
    max_memory = '750.GB'
    max_cpus   = 16
    max_time   = '240.h'
}

Samplesheet

sampleset,path_prefix,chrom,format
X2121-axy,/home/user/prova_pgs/X2121_axy_chr1,1,pfile
X2121-axy,/home/user/prova_pgs/X2121_axy_chr2,2,pfile
[...chr 3-22, X, Y...]

Command Used

nextflow run pgscatalog/pgsc_calc \
  -profile docker \
  -c custom.config \
  -resume \
  --input samplesheet.csv \
  --pgs_id PGS001931,PGS002148,PGS003516 \
  --efo_id EFO_0009695 \
  --target_build GRCh38 \
  --min_overlap 0.5 \
  --liftover \
  --hg19_chain https://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToHg38.over.chain.gz \
  --hg38_chain https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz \
  --run-ancestry https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst \
  --outdir results

The error present in the .command.err

Error: Out of memory. The --memory flag may be helpful.

Let me know if the complete log error could be helpful!

Thanks in avdance for your help!!!

Janeyre91 avatar Nov 10 '25 10:11 Janeyre91

8GB RAM isn't enough for a big target genome.

You can allocate more memory to the APPLY_SCORE stage by changing your configuration:

process {
   withName: 'APPLY_SCORE' {
     cpus = 1
     memory = 16.GB
     time = 6.hour
   }
}

nebfield avatar Nov 13 '25 10:11 nebfield