pgsc_calc
pgsc_calc copied to clipboard
Possible bug: pipeline only calculates one score of many in --scorefile parameter
Description of the bug
Hello!
It seems that running the pipeline with --scorefile path/to/scores/*.txt.gz results in only one score actually being calculated. In the score report, the nextflow command run correctly lists all 47 scores that match the wildcard path, but in the dataset_pgs.txt.gz output file, only the very first score is present.
Ultimately, I need to calculated >1000 scores and I'm hoping to do so in parallel with pgsc_calc. I need to use the --scorefile parameter, as the pipeline fails to download scores on the cluster I'm using. Please let me know if there's a way around this, and thanks for your help!
_Note: I set min_overlap extremely low to make sure all scores are calculated. _
Command used and terminal output
Command I ran:
export NXF_ANSI_LOG=false
export NXF_OPTS="-Xms500M -Xmx2G"
module load nextflow
module load singularity/3.7.0
nextflow run pgscatalog/pgsc_calc -r main -latest \
-profile singularity \
-resume \
--scorefile downloads/PGP000604/*.txt.gz \
--genotypes_cache genotypes_cache \
--input samplesheet.csv \
--min_overlap 0.1 \
--target_build GRCh37 \
--run_ancestry ~/pgs_calc/pgsc_HGDP+1kGP_v1.tar.zst \
-c ~/configs/pgs_calc_specific_slurm.config
Command reported in the score report:
nextflow run pgscatalog/pgsc_calc -r main -latest -profile singularity -resume \
--scorefile downloads/PGP000604/PGS004701_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004705_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004707_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004711_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004715_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004719_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004723_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004727_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004731_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004733_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004737_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004739_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004743_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004747_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004751_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004755_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004759_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004761_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004765_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004767_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004769_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004771_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004775_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004779_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004783_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004785_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004789_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004791_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004795_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004797_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004801_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004805_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004809_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004811_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004815_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004817_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004821_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004825_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004829_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004833_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004835_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004837_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004841_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004845_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004851_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004853_hmPOS_GRCh37.txt.gz \
downloads/PGP000604/PGS004855_hmPOS_GRCh37.txt.gz --genotypes_cache \
genotypes_cache --input samplesheet.csv --min_overlap 0.1 --target_build GRCh37 \
--run_ancestry /home/user/pgs_calc/pgsc_HGDP+1kGP_v1.tar.zst -c \
/home/user/configs/pgs_calc_specific_slurm.config
Relevant files
Content of results/PROFILE2024/match/PROFILE2024_summary.csv:
dataset,accession,score_pass,match_status,ambiguous,is_multiallelic,duplicate_best_match,duplicate_ID,match_flipped,match_IDs,count,percent PROFILE2024,PGS004701_hmPOS_GRCh37,true,matched,false,false,false,false,false,true,823218,74.30254881413667 PROFILE2024,PGS004701_hmPOS_GRCh37,true,unmatched,,,,,,,284709,25.69745118586333
Score report: failed_parallel_score_report.html.zip
System information
nextflow: nextflow/24.04.2.5914 Hardware: HPC Executor: slurm container engine: singularity OS: CentOS 7 Version of pgsc_calc: 2.0.0-beta.2
You need to use " characters when using a wildcard, e.g.:
--scorefile "downloads/PGP000604/*.txt.gz"
Without quotes your shell expands the wildcard character into a list of file paths, which stops multiple scoring files from being detected correctly