goci icon indicating copy to clipboard operation
goci copied to clipboard

Investigate and restart timed-out submissions

Open eks-ebi opened this issue 10 months ago • 17 comments

Several submissions have been stuck validating and need to be restarted:

6608c665db8d9f000198b901

65f1f8330f82060001a1b6fd

🟡 65f9de3adb8d9f0001966a07 -- 🔴 Generate metadata for 65f9de3adb8d9f0001966a07 once the sync is complete.

66073fe60f82060001a48676

🟡 65d67ba1b73c7400016aa81a -- 🔴 Generate metadata for 65d67ba1b73c7400016aa81a once the sync is complete.

This has been happening more often recently, so there may be an underlying issue that needs to be investigated.

eks-ebi avatar Apr 11 '24 13:04 eks-ebi

6608c665db8d9f000198b901 - All files are valid, but the status update failed. I have force-pushed to bypass validation; they should be marked as valid shortly.

karatugo avatar Apr 12 '24 11:04 karatugo

65f1f8330f82060001a1b6fd - The previous failure was due to a bug, now resolved (see https://github.com/EBISPOT/gwas-sumstats-service/issues/308). I have restarted the validation, but encountered a new error: 'template invalid'.

karatugo avatar Apr 12 '24 12:04 karatugo

65f9de3adb8d9f0001966a07 - It appears to be a network error. I restarted the validation, will follow up on that.

karatugo avatar Apr 12 '24 13:04 karatugo

66073fe60f82060001a48676 - The studies are across two different folders, causing the validation pipeline to fail in locating them. Is there a policy requiring files to be stored in a single, flat folder? Here are the contents of the specified directory for reference:

ls -ltr <private ftp folder name>
total 64
drwxrws--- 2 gwas_cat spot 1940 Apr 2 23:34 EBIsumstats
drwxrws--- 2 gwas_cat spot 1940 Apr 11 14:38 finalv

Update: ✅ The user fixed the file upload and validation is successful now.

karatugo avatar Apr 12 '24 15:04 karatugo

🟡 65f9de3adb8d9f0001966a07 - Increased mem and time constraints and submitted again for validation.

karatugo avatar Apr 15 '24 10:04 karatugo

🟡 65f9de3adb8d9f0001966a07 - validation passed, waiting for db status update.

For future reference, I unzipped and zipped back the file X10647.18.tsv.gz in the private ftp. The reason was that file description was missing the word 'gzip' as our file extension finder logic depends on that keyword. Details and the ticket for further investigation: https://github.com/EBISPOT/gwas-sumstats-service/issues/317

karatugo avatar Apr 17 '24 12:04 karatugo

there is another submission showing VALIDATING: 65d67ba1b73c7400016aa81a. @karatugo, can you please look into this?

Santhi1901 avatar Apr 22 '24 08:04 Santhi1901

🟡 65f9de3adb8d9f0001966a07 - ran with skipValidation again, hopefully db update won't break this time

Update. DB update worked but the files are not moved to the public ftp.

karatugo avatar Apr 22 '24 16:04 karatugo

This is also linked to this ticket here https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/gwasdepo-deposition-service/161

sprintell avatar Apr 24 '24 09:04 sprintell

🟡 65f9de3adb8d9f0001966a07 - expect the study files in the public ftp ~tomorrow~ in a few days. I'll regenerate the metadata files later.

karatugo avatar Apr 24 '24 17:04 karatugo

@karatugo

The submission (65f9de3adb8d9f0001966a07) is under embargo, so the sumstats files are not in public FTP.

In production, only some GCSTs have files in the folder.

When I checked, the files were there for GCST90421033-GCST90421797 (I did not check all GCSTs between these, but most of them have sumstats files). GCST90421798-GCST90428040 has empty folders.

Santhi1901 avatar Apr 25 '24 09:04 Santhi1901

@Santhi1901 I thought we could process all 7,008 files in one sync, but it turns out our system can only process about 700 files each night. It will take several more days to fully sync. If this is too slow, I can explore other solutions, such as initiating a manual sync.

karatugo avatar Apr 26 '24 11:04 karatugo

🟡 65d67ba1b73c7400016aa81a - expect the study files in the public ftp in a few days. I'll regenerate the metadata files later.

karatugo avatar Apr 26 '24 11:04 karatugo

65d67ba1b73c7400016aa81a and 65f9de3adb8d9f0001966a07 are not in the public ftp yet.

karatugo avatar May 03 '24 15:05 karatugo

65f9de3adb8d9f0001966a07 is now in the public ftp. Their yamls should be in the public ftp tomorrow (from GCST90421033 to GCST90428040).

karatugo avatar May 07 '24 14:05 karatugo

Santhi reported that 65d67ba1b73c7400016aa81a is showing validating again.

karatugo avatar May 09 '24 09:05 karatugo

65d67ba1b73c7400016aa81a is showing submission complete now.

I updated the sumstats meta table as studies were not existing for this submission. Updated the table by a script using PyMongo. For details: see /hps/nobackup/parkinso/spot/gwas/scratch/goci1285

karatugo avatar May 09 '24 13:05 karatugo

All the submission has restarted and validated, but some of them are not in the public ftp , ... cannot generate the mssing yaml files, we either wait for them to get to the public ftp, or close them and handle them case by case.

We leave it as it is, new yaml will be generated when curator edits the template

sprintell avatar May 15 '24 09:05 sprintell