goci
goci copied to clipboard
yaml metadata wrong - '66b35d68f8167b0001a83512'
Some of the metadata in the yaml file for these two studies does not match submitted data: GCST90446169 GCST90446168
yamls: GCST90446168-yaml.txt GCST90446169-yaml.txt
E.g., the author notes differ between both yamls and in both cases should be empty (from template). Some other fields look wrong including: date_metadata_last_modified adjusted_covariates
while other fields like reported trait look right.
Please investigate the cause and correct these yamls and the harmonised yamls.
- [x] Investigate the problem - https://github.com/EBISPOT/goci/issues/1490#issuecomment-2500884881
- [x] Fix the problem
- [x] Deploy to sandbox
- [x] Test in sandbox
- [x] Read md5sum and metadata last updated fields from existing yamls into a .csv file - find at
/hps/nobackup/parkinso/spot/gwas/scratch/goci-1490/scripts
- [x] Use that .csv file while generating all yamls
- [x] Increase the number of metadata yaml generation attempts (because when there's huge vol of data, it takes some time to sync)
- [x] Generate all metadata yaml in sandbox
- [x] Test how long it takes
- [x] Deploy to prod
- [x] Generate all yaml files
- [x] Check file names: Yue noticed the following incorrect file names.
-rw-rw-r-- 1 spotbot spot 742 Jan 6 13:00/nfs/production/parkinso/spot/gwas/prod/data/summary_statistics/GCST004001-GCST005000/GCST004062/harmonised/GCST004062.running.log-meta.yaml
-rw-rw-r-- 1 spotbot spot 740 Jan 3 13:10 /nfs/production/parkinso/spot/gwas/prod/data/summary_statistics/GCST90005001-GCST90006000/GCST90005034/GCST90005034_buildGRCh37.tsv-meta.yaml-meta.yaml
- [x] Investigate weird file extensions
- [x] Fix the following file extensions:
- [x] Delete wrong files
- [x] Gather GCST IDs for wrong metadata files at
gcst_ids_all.txt
.csv-meta.yaml-meta.yaml
.csv-meta.yaml-meta.yaml-meta.yaml
.f.tsv.gz-meta.yaml-meta.yaml
.h.tsv.gz-meta.yaml-meta.yaml
.h.tsv.gz-meta.yaml-meta.yaml-meta.yaml
.h.tsv.gz.tbi-meta.yaml
.README-meta.yaml
.running.log-meta.yaml
..tsv.gz-meta.yaml
..tsv.gz-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml
..tsv.gz-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.tsv.gz-meta.yaml.save-meta.yaml
.tsv-meta.yaml-meta.yaml
.tsv-meta.yaml-meta.yaml-meta.yaml
.tsv-meta.yaml-meta.yaml-meta.yaml-meta.yaml
.txt.gz-meta.yaml-meta.yaml
.txt.gz-meta.yaml-meta.yaml-meta.yaml
.txt-meta.yaml-meta.yaml
.txt-meta.yaml-meta.yaml-meta.yaml
meta.yaml-meta.yaml
.tbi-meta.yaml
.running.log-meta.yaml
- [x] Push fix to sandbox
- [x] Push fix to prod
- [x] Mark pending at
/hps/nobackup/parkinso/spot/gwas/scratch/goci-1490/scripts/gcst_ids_all.txt
- [x] Push again to prod once codon cluster is stable
- [x] Make sure again there are no weird file extensions
- [x] Depo sync complete?
- [x] ftp sync complete?
- [x] Delete weird file extensions from public ftp because ftp-sync does not delete files
- [x] Revert the changes for comparing md5sums from csvs in gwas-sumstats-service - https://github.com/EBISPOT/gwas-sumstats-service/pull/376
- [x] Deploy to prod when yaml generation load is low
- [ ] Check UKB GCSTs
- [ ] Check GCSTs at
/homes/yueji/raw_file_also_in_randome_name.tsv
from Yue - [ ] Check failed cases