goci icon indicating copy to clipboard operation
goci copied to clipboard

yaml metadata wrong - '66b35d68f8167b0001a83512'

Open earlEBI opened this issue 3 months ago • 40 comments

Some of the metadata in the yaml file for these two studies does not match submitted data: GCST90446169 GCST90446168

yamls: GCST90446168-yaml.txt GCST90446169-yaml.txt

E.g., the author notes differ between both yamls and in both cases should be empty (from template). Some other fields look wrong including: date_metadata_last_modified adjusted_covariates

while other fields like reported trait look right.

Please investigate the cause and correct these yamls and the harmonised yamls.


  • [x] Investigate the problem - https://github.com/EBISPOT/goci/issues/1490#issuecomment-2500884881
  • [x] Fix the problem
  • [x] Deploy to sandbox
  • [x] Test in sandbox
  • [x] Read md5sum and metadata last updated fields from existing yamls into a .csv file - find at /hps/nobackup/parkinso/spot/gwas/scratch/goci-1490/scripts
  • [x] Use that .csv file while generating all yamls
  • [x] Increase the number of metadata yaml generation attempts (because when there's huge vol of data, it takes some time to sync)
  • [x] Generate all metadata yaml in sandbox
  • [x] Test how long it takes
  • [x] Deploy to prod
  • [x] Generate all yaml files
  • [x] Check file names: Yue noticed the following incorrect file names.
-rw-rw-r-- 1 spotbot spot 742 Jan  6 13:00/nfs/production/parkinso/spot/gwas/prod/data/summary_statistics/GCST004001-GCST005000/GCST004062/harmonised/GCST004062.running.log-meta.yaml
-rw-rw-r-- 1 spotbot spot 740 Jan  3 13:10 /nfs/production/parkinso/spot/gwas/prod/data/summary_statistics/GCST90005001-GCST90006000/GCST90005034/GCST90005034_buildGRCh37.tsv-meta.yaml-meta.yaml
  • [x] Investigate weird file extensions
  • [x] Fix the following file extensions:
    • [x] Delete wrong files
    • [x] Gather GCST IDs for wrong metadata files at gcst_ids_all.txt
.csv-meta.yaml-meta.yaml
.csv-meta.yaml-meta.yaml-meta.yaml
.f.tsv.gz-meta.yaml-meta.yaml
.h.tsv.gz-meta.yaml-meta.yaml
.h.tsv.gz-meta.yaml-meta.yaml-meta.yaml
.h.tsv.gz.tbi-meta.yaml
.README-meta.yaml
.running.log-meta.yaml
..tsv.gz-meta.yaml
..tsv.gz-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml
..tsv.gz-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml.save-meta.yaml
.tsv-meta.yaml-meta.yaml
.tsv-meta.yaml-meta.yaml-meta.yaml
.tsv-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.txt.gz-meta.yaml-meta.yaml
.txt.gz-meta.yaml-meta.yaml-meta.yaml
.txt-meta.yaml-meta.yaml
.txt-meta.yaml-meta.yaml-meta.yaml
meta.yaml-meta.yaml
.tbi-meta.yaml
.running.log-meta.yaml
  • [x] Push fix to sandbox
  • [x] Push fix to prod
  • [x] Mark pending at /hps/nobackup/parkinso/spot/gwas/scratch/goci-1490/scripts/gcst_ids_all.txt
  • [x] Push again to prod once codon cluster is stable
  • [x] Make sure again there are no weird file extensions
  • [x] Depo sync complete?
  • [x] ftp sync complete?
  • [x] Delete weird file extensions from public ftp because ftp-sync does not delete files
  • [x] Revert the changes for comparing md5sums from csvs in gwas-sumstats-service - https://github.com/EBISPOT/gwas-sumstats-service/pull/376
  • [x] Deploy to prod when yaml generation load is low
  • [ ] Check UKB GCSTs
  • [ ] Check GCSTs at /homes/yueji/raw_file_also_in_randome_name.tsv from Yue
  • [ ] Check failed cases

earlEBI avatar Nov 19 '24 12:11 earlEBI