goci
goci copied to clipboard
Investigate and restart timed-out submissions
Several submissions have been stuck validating and need to be restarted:
✅ 6608c665db8d9f000198b901
✅ 65f1f8330f82060001a1b6fd
🟡 65f9de3adb8d9f0001966a07
-- 🔴 Generate metadata for 65f9de3adb8d9f0001966a07
once the sync is complete.
✅ 66073fe60f82060001a48676
🟡 65d67ba1b73c7400016aa81a
-- 🔴 Generate metadata for 65d67ba1b73c7400016aa81a
once the sync is complete.
This has been happening more often recently, so there may be an underlying issue that needs to be investigated.
✅ 6608c665db8d9f000198b901
- All files are valid, but the status update failed. I have force-pushed to bypass validation; they should be marked as valid shortly.
✅ 65f1f8330f82060001a1b6fd
- The previous failure was due to a bug, now resolved (see https://github.com/EBISPOT/gwas-sumstats-service/issues/308). I have restarted the validation, but encountered a new error: 'template invalid'.
❌ 65f9de3adb8d9f0001966a07
- It appears to be a network error. I restarted the validation, will follow up on that.
❌ 66073fe60f82060001a48676
- The studies are across two different folders, causing the validation pipeline to fail in locating them. Is there a policy requiring files to be stored in a single, flat folder? Here are the contents of the specified directory for reference:
ls -ltr <private ftp folder name>
total 64
drwxrws--- 2 gwas_cat spot 1940 Apr 2 23:34 EBIsumstats
drwxrws--- 2 gwas_cat spot 1940 Apr 11 14:38 finalv
Update: ✅ The user fixed the file upload and validation is successful now.
🟡 65f9de3adb8d9f0001966a07
- Increased mem and time constraints and submitted again for validation.
🟡 65f9de3adb8d9f0001966a07
- validation passed, waiting for db status update.
For future reference, I unzipped and zipped back the file X10647.18.tsv.gz
in the private ftp. The reason was that file description was missing the word 'gzip' as our file extension finder logic depends on that keyword. Details and the ticket for further investigation: https://github.com/EBISPOT/gwas-sumstats-service/issues/317
there is another submission showing VALIDATING: 65d67ba1b73c7400016aa81a
. @karatugo, can you please look into this?
🟡 65f9de3adb8d9f0001966a07
- ran with skipValidation
again, hopefully db update won't break this time
Update. DB update worked but the files are not moved to the public ftp.
This is also linked to this ticket here https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwasdepo-deposition-service/161
🟡 65f9de3adb8d9f0001966a07
- expect the study files in the public ftp ~tomorrow~ in a few days. I'll regenerate the metadata files later.
@karatugo
The submission (65f9de3adb8d9f0001966a07) is under embargo, so the sumstats files are not in public FTP.
In production, only some GCSTs have files in the folder.
When I checked, the files were there for GCST90421033-GCST90421797 (I did not check all GCSTs between these, but most of them have sumstats files). GCST90421798-GCST90428040 has empty folders.
@Santhi1901 I thought we could process all 7,008 files in one sync, but it turns out our system can only process about 700 files each night. It will take several more days to fully sync. If this is too slow, I can explore other solutions, such as initiating a manual sync.
🟡 65d67ba1b73c7400016aa81a
- expect the study files in the public ftp in a few days. I'll regenerate the metadata files later.
65d67ba1b73c7400016aa81a
and 65f9de3adb8d9f0001966a07
are not in the public ftp yet.
65f9de3adb8d9f0001966a07
is now in the public ftp. Their yamls should be in the public ftp tomorrow (from GCST90421033
to GCST90428040
).
Santhi reported that 65d67ba1b73c7400016aa81a
is showing validating again.
✅ 65d67ba1b73c7400016aa81a
is showing submission complete now.
I updated the sumstats meta table as studies were not existing for this submission. Updated the table by a script using PyMongo. For details: see /hps/nobackup/parkinso/spot/gwas/scratch/goci1285
All the submission has restarted and validated, but some of them are not in the public ftp , ... cannot generate the mssing yaml files, we either wait for them to get to the public ftp, or close them and handle them case by case.
We leave it as it is, new yaml will be generated when curator edits the template