goci
goci copied to clipboard
Slurm Cluster Migration for Python Infrastructure
-
[ ] gwas-sumstats-harmoniser
-
[x] Summary statistics with HDF5
-
[x] Summary Statistics File Validator
-
[x] gwas-sumstats-tools
-
[x] sum-stats-formatter
-
[x] eQTL-SumStats
-
[x] gwas-template-services
-
[x] gwas-sumstats-service
-
[x] gwas-utils
-
[x] gwas-curation-utils
-
[x] gwas-ebi-search-index
-
[x] gwas-solr-slim
@karatugo should have a session with @jdhayhurst before commencing this
Do harmoniser last so Yue can have time to complete her work
Repo | PR | Status | Notes |
---|---|---|---|
gwas-sumstats-harmoniser | https://github.com/EBISPOT/gwas-sumstats-harmoniser/pull/82 & https://github.com/EBISPOT/gwas-sumstats-harmoniser/pull/83 & https://github.com/EBISPOT/gwas-utils/pull/159 | Done. Release needed for harmoniser & PRE_GWAS-SSF harmoniser. | 1) Yue suggested 48h time limit in SLURM. 2) Development done, testing done in sandbox by Yue, pull requests for harmoniser and pre-gwas-ssf harmoniser merged to respective main branches. Glue scripts migrated to SLURM and added to GitHub for better tracking. |
Summary statistics with HDF5 | Skipped | Discussed with Yomi and we agreed not to invest time in this as it will be replaced by another technology soon. | |
Summary Statistics File Validator | Skipped | Skipped as it was deprecated | |
gwas-sumstats-tools | Done | No LSF usage was found | |
sum-stats-formatter | https://github.com/EBISPOT/sum-stats-formatter/pull/86 | Done | Merged with the temp sbactch script file implementation and created the following backlog item. https://github.com/EBISPOT/sum-stats-formatter/issues/88 |
eQTL-SumStats | Skipped | Postponed. Will check in the next release cycle if it needs an update. | |
gwas-template-services | Done | No LSF usage was found | |
gwas-sumstats-service | https://github.com/EBISPOT/gwas-sumstats-service/pull/273 & https://github.com/EBISPOT/gwas-sumstats-service/pull/274 & https://github.com/EBISPOT/gwas-sumstats-service/pull/275 & https://github.com/EBISPOT/gwas-sumstats-service/pull/276 | Done. Need to do tag release for the migration. | Test OK for Celery workers start and refresh with scrontab. Created new START_CELERY_WORKERS_SLURM.sh in dev and prod. Also new start_celery_worker_slurm.sh script in dev and prod. Tested OK in the sandbox env. |
gwas-utils | https://github.com/EBISPOT/gwas-utils/pull/158 | Done | LSF is not used anymore, cleaned up the old LSF code |
gwas-curation-utils | Done | No LSF usage was found | |
gwas-ebi-search-index | Done | No LSF usage was found | |
gwas-solr-slim | https://github.com/EBISPOT/gwas-solr-slim/pull/52 | Done. AFAIK no releases used but the new script start_slurm.sh . |
Test OK in dev. Also, created ${bamboo.sw_dir}/${bamboo.env_dir}/scripts/gwas-solr-slim/start_slurm.sh . |
All done. Releases needed for the migration to SLURM.
This wil be released wiht metadata Yaml Update Feature
Error in SLURM - waiting for input from TSC
Released https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.0.5 and https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.1.4
Prepared scrontab
entries for harmoniser.
- [ ] Enable them before deployment
- [ ] Disable crontab entries also
For gwas-sumstats-harmoniser migration:
- Moved crontab items to scrontab
- Released https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.0.5 and https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.1.4
- Updated harmonisation scripts https://github.com/EBISPOT/gwas-utils/pull/167
- Updated harmonisation wrappers https://github.com/EBISPOT/gwas-utils/pull/168
- Created new container configs
- Updated NXF asset values in scripts
For gwas-sumstats-harmoniser migration:
Test submitted to codon-slurm but failed. @jiyue1214 is helping me to investigate the problem.
Released https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.1.5 and https://github.com/EBISPOT/gwas-sumstats-harmoniser/releases/tag/v1.0.6 and submitted the test files to codon-slurm again.
For gwas-sumstats-harmoniser migration:
Test submitted to codon-slurm and it's successful. There's one small mistake in meta.yaml files. @jiyue1214 is helping me to investigate the problem.
Thanks to @jiyue1214 fix, released v1.1.7 and v1.0.7 now and testing again in codon-slurm.
[gwas_lsf@codon-dm-06 cron]$ ./start_harmonisation_slurm_test_goci1179.sh
Submitted batch job 65232999
For gwas-sumstats-harmoniser migration:
I compared the output of the harmonisation pipeline in SLURM and LSF.
-
.h.tsv.gz
,.h.tsv.gz.tbi
,md5sum.txt
files are identical. - In running.log, we have a higher percentage of sites that carried forward.
- In meta yaml, @jiyue1214 fixed a few bugs (coordinate system and samples). (thanks @jiyue1214 !)
I suggest we deploy this after the Easter long weekend. I'll coordinate it with Yue.
This is waiting for final update from @jiyue1214
Issue: In running.log, we have a higher percentage of sites that are carried forward.
Primary investigation:
Percentage of sites that are carried forward = Carried forward variants / ( Carried forward variants + Unmapped variants).
Based on the log file, the number of sites that are carried forward are same, which means the difference is caused by the unmapped variants. To investigate the reason why unmapped variants are different, I need to rerun the pipeline and use intermediate files to help.
I rerun the pipeline with the intermediate files and found:
- Their intermediate files are the same (md5sum of two unmapped files are identical)
- I can repeat the slight difference between the LSF and Slurm, but the Slurm result is the correct number.
- In the LSF, nextflow read the GCST90293086's unmapped file to GCST90293085 log work folder. However, in the Slurm, it is it was the correct one.
This is not supported by the code difference. However, to double-check it, yue can change the LSF code to slurm (only change the executor.)
I confirm the Slurm result is correct. We can close this ticket. For the reason causing the problem on LSF (the Harmonisation result is correct, only the unmapped file did not match the GCST), I will generate another ticket to look into more details.
@karatugo to release
@jiyue1214 added additional feature, waiting for Yue before releasing
- All scripts are ready and will start to run today via crontab
- A small action is that I will active scrontab instead of crontab based on the ITSC info
Nextflow pipeline is running on Slurm and can be monitored by the nextflow tower daily.
Question: @karatugo, According to the scrontab, we have not activated the refresh harmonisation queue, queue GWAS-SSF files for harmonisation, and queue pre-GWAS-SSF files for harmonisation. Should we activate them as well?
We have migrated all crontab jobs to Slurm this morning. This ticket can be moved to Done. Just need to double-check if they are running successfully tomorrow.
been release at the moment, ticket due to be closed at end of sprint