goci
goci copied to clipboard
Yamls (and sumstats) not being created
Several issues possibly connected:
🟡 It seems that all GCSTs created on staging since 05.04.24 have no yamls, either on staging or public FTP. -- ✅ Generate the missing YAML files ---- ✅ List all GCSTs created since Apr 5 ------ ✅ Send the list to Yue too ---- ✅ Publish RabbitMQ message for them using https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwas-sumstats-service/327 ---- ✅ Check if the missing yamls are synced to the public ftp -- 🔴 Clean up test submissions from prod db
~🔴 One particular submission with more than 6,000 studies only has md5sums in GCST directories on staging and released to public. No sumstats files or yamls have been created. Submitted on 31.03.24. https://www.ebi.ac.uk/gwas/deposition/submission/6608c665db8d9f000198b901~ Moved to https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1304
~🔴 There seem to be folders being created on staging for GCSTs which do not have sumstats. Previously, folders were only created on staging for those GCSTs with full p-value set ticked (i.e. have sumstats) - eg. GCST90321079, GCST90310292, GCST90397904~ Moved to https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1306
Item 1 should be fixed now.
created another issue for the 2nd issue as it's a different case. https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1304
created another issue for the 3rd issue as it's a different case. https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1306
List of all GCSTs created since Apr 5 is at /hps/nobackup/parkinso/spot/gwas/scratch/goci1292/gcst_ids.txt
Published them to RabbitMQ.
I've updated the crontab entries for depo-sync
. There are now two entries: one begins at 13:00 and the other at 20:00. Each runs for 3 hours and 55 minutes to prevent overlap with ftp-sync
, which starts at midnight.
Missing yaml seem synced to the public ftp. Please validate @earlEBI
@eks-ebi pls help confirm ...
@karatugo There are still several GCSTs on public without yamls, eg: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90428001-GCST90429000/GCST90428117/ http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90428001-GCST90429000/GCST90428431/
These are from separate submissions, 1 is published, the other is not.
This is not an exhaustive list of GCST directories on public FTP without yamls, just 2 I found randomly.
Hmm, this is because the list of GCSTs created since Apr 5th is created on May 1st. That doesn't include the ones published after May 1st. I'll run the list generation and yaml generation again now but I think moving forward we can do this on ad hoc basis. This is because our system cannot generate yamls for studies which are not the public ftp yet.
Ok, so, say, once a week, all the sumstats released in the previous week will have yamls generated in one go?
The list of studies created after Apr 5th and without yaml files: /hps/nobackup/parkinso/spot/gwas/scratch/goci1292/gcst_ids_no_meta_file2.txt
Published the list to RabbitMQ using https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwas-sumstats-service/327
Ok, so, say, once a week, all the sumstats released in the previous week will have yamls generated in one go?
@earlEBI The submissions since the fix date (Apr 24th) will have their yamls already generated. I meant the submissions between Apr 5th and the fix date which are not in the public ftp yet.
@karatugo ok I think I understand, thanks.
After merge, use https://github.com/EBISPOT/gwas-sumstats-service/pull/328 for publishing again. This is needed for the edge cases, e.g., GCST003001-GCST004000/GCST003898
and GCST90427001-GCST90428000/GCST90428000
@karatugo 'll fix some few bugs discovered yesterday.
Fixed the bugs discovered. https://github.com/EBISPOT/gwas-sumstats-service/pull/337
Published the GCST IDs to RabbitMQ after the fix. Expect the files in the pub ftp in 2 days (except GCST90422150 which I posted in Slack).
Published GCST ID: GCST003898
Published GCST ID: GCST90384000
Published GCST ID: GCST90385000
Published GCST ID: GCST90386000
Published GCST ID: GCST90387000
Published GCST ID: GCST90388000
Published GCST ID: GCST90389000
Published GCST ID: GCST90390000
Published GCST ID: GCST90422000
Published GCST ID: GCST90422150
Published GCST ID: GCST90423000
Published GCST ID: GCST90424000
Published GCST ID: GCST90425000
Published GCST ID: GCST90426000
Published GCST ID: GCST90427000
Published GCST ID: GCST90428000
GCST90422150
synced to the pub ftp and generated for this as well.
Created clean-up ticket. https://github.com/EBISPOT/goci/issues/1325
Check on Monday May 20 if the files are the pub ftp for the following.
Published GCST ID: GCST003898 Published GCST ID: GCST90384000 Published GCST ID: GCST90385000 Published GCST ID: GCST90386000 Published GCST ID: GCST90387000 Published GCST ID: GCST90388000 Published GCST ID: GCST90389000 Published GCST ID: GCST90390000 Published GCST ID: GCST90422000 Published GCST ID: GCST90422150 Published GCST ID: GCST90423000 Published GCST ID: GCST90424000 Published GCST ID: GCST90425000 Published GCST ID: GCST90426000 Published GCST ID: GCST90427000 Published GCST ID: GCST90428000
Yamls for the published GCST ID are present in the pub ftp.
@earlEBI Please confirm if this ticket can be closed.
@karatugo Sample info and ontology mapping are missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST003001-GCST004000/GCST003898/
Ontology mapping is missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90427001-GCST90428000/GCST90428000/ , etc.
@sajo-ebi Sample info and ontology mapping data (key: efo_trait
is missing from the ingest api response. See for example:
https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST003898 https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST003898/samples https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST90428000
@karatugo I checked both the GCST in DB the efo trait information is missing for both studies . Also Sample information is missing for the GCST 'GCST003898', this can verified by logging in to the below submissions in Deposition app & clicking on the ' Download Study Accessions' 65fc3dcab73c7400017c34b2 65f9de3adb8d9f0001966a07
Hi @earlEBI
Ontology mapping is missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90427001-GCST90428000/GCST90428000/ , etc.
we think these studies shouldn't have EFO as it's a prepub submission, could you just confirm that is correct please?
@ljwh2 That makes sense. I assumed the field just wouldn't exist for pre-pub submissions but if it's supposed to just appear empty, that's okay.
The sample is still empty for GCST003898 And file_type is empty for GCST90428000 and others.
And file_type is empty for GCST90428000 and others.
Published GCST IDs (GCST90428000 and other studies of this huge submission) to RabbitMQ. Expect them in the public ftp in 2 days.