goci icon indicating copy to clipboard operation
goci copied to clipboard

Yamls (and sumstats) not being created

Open earlEBI opened this issue 10 months ago • 16 comments

Several issues possibly connected:

🟡 It seems that all GCSTs created on staging since 05.04.24 have no yamls, either on staging or public FTP. -- ✅ Generate the missing YAML files ---- ✅ List all GCSTs created since Apr 5 ------ ✅ Send the list to Yue too ---- ✅ Publish RabbitMQ message for them using https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwas-sumstats-service/327 ---- ✅ Check if the missing yamls are synced to the public ftp -- 🔴 Clean up test submissions from prod db

~🔴 One particular submission with more than 6,000 studies only has md5sums in GCST directories on staging and released to public. No sumstats files or yamls have been created. Submitted on 31.03.24. https://www.ebi.ac.uk/gwas/deposition/submission/6608c665db8d9f000198b901~ Moved to https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1304

~🔴 There seem to be folders being created on staging for GCSTs which do not have sumstats. Previously, folders were only created on staging for those GCSTs with full p-value set ticked (i.e. have sumstats) - eg. GCST90321079, GCST90310292, GCST90397904~ Moved to https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1306

earlEBI avatar Apr 16 '24 16:04 earlEBI

Item 1 should be fixed now.

karatugo avatar Apr 23 '24 23:04 karatugo

created another issue for the 2nd issue as it's a different case. https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1304

karatugo avatar Apr 26 '24 14:04 karatugo

created another issue for the 3rd issue as it's a different case. https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1306

karatugo avatar Apr 29 '24 10:04 karatugo

List of all GCSTs created since Apr 5 is at /hps/nobackup/parkinso/spot/gwas/scratch/goci1292/gcst_ids.txt

karatugo avatar May 01 '24 15:05 karatugo

Published them to RabbitMQ.

karatugo avatar May 03 '24 14:05 karatugo

I've updated the crontab entries for depo-sync. There are now two entries: one begins at 13:00 and the other at 20:00. Each runs for 3 hours and 55 minutes to prevent overlap with ftp-sync, which starts at midnight.

karatugo avatar May 03 '24 14:05 karatugo

Missing yaml seem synced to the public ftp. Please validate @earlEBI

karatugo avatar May 07 '24 14:05 karatugo

@eks-ebi pls help confirm ...

sprintell avatar May 08 '24 09:05 sprintell

@karatugo There are still several GCSTs on public without yamls, eg: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90428001-GCST90429000/GCST90428117/ http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90428001-GCST90429000/GCST90428431/

These are from separate submissions, 1 is published, the other is not.

This is not an exhaustive list of GCST directories on public FTP without yamls, just 2 I found randomly.

earlEBI avatar May 13 '24 08:05 earlEBI

Hmm, this is because the list of GCSTs created since Apr 5th is created on May 1st. That doesn't include the ones published after May 1st. I'll run the list generation and yaml generation again now but I think moving forward we can do this on ad hoc basis. This is because our system cannot generate yamls for studies which are not the public ftp yet.

karatugo avatar May 13 '24 10:05 karatugo

Ok, so, say, once a week, all the sumstats released in the previous week will have yamls generated in one go?

earlEBI avatar May 13 '24 10:05 earlEBI

The list of studies created after Apr 5th and without yaml files: /hps/nobackup/parkinso/spot/gwas/scratch/goci1292/gcst_ids_no_meta_file2.txt

karatugo avatar May 13 '24 11:05 karatugo

Published the list to RabbitMQ using https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwas-sumstats-service/327

karatugo avatar May 13 '24 11:05 karatugo

Ok, so, say, once a week, all the sumstats released in the previous week will have yamls generated in one go?

@earlEBI The submissions since the fix date (Apr 24th) will have their yamls already generated. I meant the submissions between Apr 5th and the fix date which are not in the public ftp yet.

karatugo avatar May 13 '24 15:05 karatugo

@karatugo ok I think I understand, thanks.

earlEBI avatar May 13 '24 15:05 earlEBI

After merge, use https://github.com/EBISPOT/gwas-sumstats-service/pull/328 for publishing again. This is needed for the edge cases, e.g., GCST003001-GCST004000/GCST003898 and GCST90427001-GCST90428000/GCST90428000

karatugo avatar May 14 '24 15:05 karatugo

@karatugo 'll fix some few bugs discovered yesterday.

sprintell avatar May 15 '24 09:05 sprintell

Fixed the bugs discovered. https://github.com/EBISPOT/gwas-sumstats-service/pull/337

karatugo avatar May 17 '24 11:05 karatugo

Published the GCST IDs to RabbitMQ after the fix. Expect the files in the pub ftp in 2 days (except GCST90422150 which I posted in Slack).

Published GCST ID: GCST003898
Published GCST ID: GCST90384000
Published GCST ID: GCST90385000
Published GCST ID: GCST90386000
Published GCST ID: GCST90387000
Published GCST ID: GCST90388000
Published GCST ID: GCST90389000
Published GCST ID: GCST90390000
Published GCST ID: GCST90422000
Published GCST ID: GCST90422150
Published GCST ID: GCST90423000
Published GCST ID: GCST90424000
Published GCST ID: GCST90425000
Published GCST ID: GCST90426000
Published GCST ID: GCST90427000
Published GCST ID: GCST90428000

karatugo avatar May 17 '24 11:05 karatugo

GCST90422150 synced to the pub ftp and generated for this as well.

karatugo avatar May 17 '24 11:05 karatugo

Created clean-up ticket. https://github.com/EBISPOT/goci/issues/1325

karatugo avatar May 17 '24 13:05 karatugo

Check on Monday May 20 if the files are the pub ftp for the following.

Published GCST ID: GCST003898 Published GCST ID: GCST90384000 Published GCST ID: GCST90385000 Published GCST ID: GCST90386000 Published GCST ID: GCST90387000 Published GCST ID: GCST90388000 Published GCST ID: GCST90389000 Published GCST ID: GCST90390000 Published GCST ID: GCST90422000 Published GCST ID: GCST90422150 Published GCST ID: GCST90423000 Published GCST ID: GCST90424000 Published GCST ID: GCST90425000 Published GCST ID: GCST90426000 Published GCST ID: GCST90427000 Published GCST ID: GCST90428000

karatugo avatar May 17 '24 13:05 karatugo

Yamls for the published GCST ID are present in the pub ftp.

karatugo avatar May 20 '24 12:05 karatugo

@earlEBI Please confirm if this ticket can be closed.

karatugo avatar May 20 '24 12:05 karatugo

@karatugo Sample info and ontology mapping are missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST003001-GCST004000/GCST003898/

Ontology mapping is missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90427001-GCST90428000/GCST90428000/ , etc.

earlEBI avatar May 20 '24 12:05 earlEBI

@sajo-ebi Sample info and ontology mapping data (key: efo_trait is missing from the ingest api response. See for example:

https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST003898 https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST003898/samples https://www.ebi.ac.uk/gwas/ingest/api/v2/studies/GCST90428000

karatugo avatar May 20 '24 12:05 karatugo

@karatugo I checked both the GCST in DB the efo trait information is missing for both studies . Also Sample information is missing for the GCST 'GCST003898', this can verified by logging in to the below submissions in Deposition app & clicking on the ' Download Study Accessions' 65fc3dcab73c7400017c34b2 65f9de3adb8d9f0001966a07

sajo-ebi avatar May 23 '24 15:05 sajo-ebi

Hi @earlEBI

Ontology mapping is missing for http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90427001-GCST90428000/GCST90428000/ , etc.

we think these studies shouldn't have EFO as it's a prepub submission, could you just confirm that is correct please?

ljwh2 avatar May 29 '24 09:05 ljwh2

@ljwh2 That makes sense. I assumed the field just wouldn't exist for pre-pub submissions but if it's supposed to just appear empty, that's okay.

The sample is still empty for GCST003898 And file_type is empty for GCST90428000 and others.

earlEBI avatar May 29 '24 09:05 earlEBI

And file_type is empty for GCST90428000 and others.

Published GCST IDs (GCST90428000 and other studies of this huge submission) to RabbitMQ. Expect them in the public ftp in 2 days.

karatugo avatar May 29 '24 14:05 karatugo