pgsc_calc icon indicating copy to clipboard operation
pgsc_calc copied to clipboard

Problematic harmonzied scorefile for PGS004700 crashes pgs_calc 2.0.0-alpha5

Open smlmbrt opened this issue 1 year ago • 2 comments

Discussed in https://github.com/PGScatalog/pgsc_calc/discussions/278

Originally posted by bgulko April 16, 2024 runing pgs_calc with --trait_efo EFO_0005140 accesses PGS004700 (grch38/singularity), which generates error and halts the pipeline in

combine_scorefiles

When I run using the scorefile ID's directly this completes, so long as I leave PGS004700 out.

I have implemented a workaround to manually download PGS id for each EFO and filter our known problematic PGS files before scoring, then use the --pgs_id id flag with batches of no more than 100 scores, but it might be helpful for those using --trait_efo if this were addressed.

Have retained diags and happy to provide more info as is helpful.

--Brad


pgscatalog_utils.scorefile.write: 2024-04-15 20:33:11 INFO     Writing PGS004700_hmPOS_GRCh38 variants
pgscatalog_utils.scorefile.qc: 2024-04-15 20:33:11 CRITICAL Bad effect type setting: ScoreVariant(effect_allele='HLA-DRB1*15:01',effect_weight='1.37',accession='PGS004700_hmPOS_GRCh38',row_nr=0,chr_name='',chr_position=None,rsID='.',other_allele='',hm_chr='',hm_pos=None,hm_inferOtherAllele='',hm_source='Unknown',is_dominant=None,is_recessive=False,hm_rsID='',hm_match_chr=None,hm_match_pos=None,is_duplicated=False,effect_type=EffectType.ADDITIVE,is_complex=True)
Traceback (most recent call last):
  File "/venv/bin/combine_scorefiles", line 8, in <module>
    sys.exit(combine_scorefiles())
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/combine_scorefiles.py", line 50, in combine_scorefiles
    logs: dict[str, int] = write_combined(sfs, args.outfile)
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/write.py", line 166, in write_combined
    batch = list(islice(scoring_file.variants, Config.batch_size))
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 62, in check_duplicates
    for variant in variants:
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 215, in detect_complex
    for variant in variants:
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 197, in check_effect_allele
    for variant in variants:
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 123, in assign_other_allele
    for variant in variants:
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 110, in check_effect_weight
    for variant in variants:
  File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/qc.py", line 148, in assign_effect_type
    raise Exception
Exception

smlmbrt avatar Apr 16 '24 08:04 smlmbrt

pretty sure this was fixed in https://github.com/PGScatalog/pygscatalog/pull/15 but we haven't integrated it yet

nebfield avatar Apr 16 '24 08:04 nebfield

@nebfield Is there an approximate release date for the alpha 6? I'm trying to execute the pipeline for multiple PGS, and I keep getting this error.

lemieuxl avatar Apr 23 '24 18:04 lemieuxl

@lemieuxl released now https://github.com/PGScatalog/pgsc_calc/releases/tag/v2.0.0-alpha.6

sorry for the delay, the update ended up including more things than expected

nebfield avatar May 24 '24 13:05 nebfield

@nebfield Awesome, thank you for the great tool!

lemieuxl avatar May 24 '24 13:05 lemieuxl