pgsc_calc
pgsc_calc copied to clipboard
Problematic harmonzied scorefile for PGS004700 crashes pgs_calc 2.0.0-alpha5
Discussed in https://github.com/PGScatalog/pgsc_calc/discussions/278
Originally posted by bgulko April 16, 2024 runing pgs_calc with --trait_efo EFO_0005140 accesses PGS004700 (grch38/singularity), which generates error and halts the pipeline in
combine_scorefiles
When I run using the scorefile ID's directly this completes, so long as I leave PGS004700 out.
I have implemented a workaround to manually download PGS id for each EFO and filter our known problematic PGS files before scoring, then use the --pgs_id id flag with batches of no more than 100 scores, but it might be helpful for those using --trait_efo if this were addressed.
Have retained diags and happy to provide more info as is helpful.
--Brad
pgscatalog_utils.scorefile.write: 2024-04-15 20:33:11 INFO Writing PGS004700_hmPOS_GRCh38 variants
pgscatalog_utils.scorefile.qc: 2024-04-15 20:33:11 CRITICAL Bad effect type setting: ScoreVariant(effect_allele='HLA-DRB1*15:01',effect_weight='1.37',accession='PGS004700_hmPOS_GRCh38',row_nr=0,chr_name='',chr_position=None,rsID='.',other_allele='',hm_chr='',hm_pos=None,hm_inferOtherAllele='',hm_source='Unknown',is_dominant=None,is_recessive=False,hm_rsID='',hm_match_chr=None,hm_match_pos=None,is_duplicated=False,effect_type=EffectType.ADDITIVE,is_complex=True)
Traceback (most recent call last):
File "/venv/bin/combine_scorefiles", line 8, in <module>
sys.exit(combine_scorefiles())
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/combine_scorefiles.py", line 50, in combine_scorefiles
logs: dict[str, int] = write_combined(sfs, args.outfile)
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/write.py", line 166, in write_combined
batch = list(islice(scoring_file.variants, Config.batch_size))
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 62, in check_duplicates
for variant in variants:
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 215, in detect_complex
for variant in variants:
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 197, in check_effect_allele
for variant in variants:
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 123, in assign_other_allele
for variant in variants:
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 110, in check_effect_weight
for variant in variants:
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 148, in assign_effect_type
raise Exception
Exception
pretty sure this was fixed in https://github.com/PGScatalog/pygscatalog/pull/15 but we haven't integrated it yet
@nebfield Is there an approximate release date for the alpha 6? I'm trying to execute the pipeline for multiple PGS, and I keep getting this error.
@lemieuxl released now https://github.com/PGScatalog/pgsc_calc/releases/tag/v2.0.0-alpha.6
sorry for the delay, the update ended up including more things than expected
@nebfield Awesome, thank you for the great tool!