Verify container execution on contemporary GPU compute capabilities
Description of feature
As discussed in dev meeting, in the lead-up to v2 we should verify the multi-hardware compatibility of the current quay.io/nf-core/proteinfold containers. This compatiblity matrix could be useful as a public reference, hopefully most work but some repos are unmaintained (e.g. ESMFold, leading to #212)
The cluster at UNSW is a rolling cluster, so we have access to a range of hardware nodes and both CUDA 11 & 12 versions. My proposal is we verify on as-of-now 'last gen', 'current' and 'new gen' GPUS (Volta, Ampere, Hopper) -- we have all of those on the UNSW cluster. We don't have H100s so will use H200 as proxy (they're the same chip/CC).
| Mode | GTX2080Ti | V100 | A100 | H100 | H200 | MI250X | RP6000 (compute 12.0) |
|---|---|---|---|---|---|---|---|
| AlphaFold2 | YES(@JoseEspinosa) | YES | YES | YES(@JoseEspinosa) | YES | -------- | NO |
| Boltz | NO(@JoseEspinosa) | YES | YES | YES(@JoseEspinosa) | YES | -------- | NO |
| ColabFold | --------- | YES | YES | --------- | YES | -------- | NO |
| ESMFold | YES(@JoseEspinosa) | YES | YES | YES(@JoseEspinosa) | NO https://github.com/nf-core/proteinfold/issues/212 | -------- | NO |
| RosettaFold2NA | YES(@JoseEspinosa) | --------- | --------- | YES(@JoseEspinosa) | --------- | -------- | -------- |
| RosettaFold-AA | YES(@JoseEspinosa) | YES | YES | YES(@JoseEspinosa) | YES | -------- | NO |
| AlphaFold3 | NO(@JoseEspinosa) | --------- | --------- | YES(@JoseEspinosa) | -------- | -------- | -------- |
| HelixFold3 | --------- | YES | YES | NO(@JoseEspinosa, #349) | NO(@jscgh, #349) | -------- | NO |
Thanks for opening the issue @keiran-rowell-unsw We have H100 at the CRG, so I will give it a try.
I also added RTX 2080, although they are not meant for HPC systems as the others, we had them in the old cluster and now are also accessible in the new one, the exact GPU is GeForce RTX 2080 Ti
I also added
RTX 2080, although they are not meant for HPC systems as the others, we had them in the old cluster and now are also accessible in the new one, the exact GPU isGeForce RTX 2080 Ti
Awesome, added! Also brings testing to compute capability 7.5. I can imagine a few labs will have workstation set ups. I have lot of respect for the dollar value of consumer Ti GPUs. I flogged a GTX780Ti to death way back when.
Just had a chat with @SarahBeecroft from Pawsey, 'Australia's friendliest supercomputer'. They have a fleet of AMD MI250X and Sarah at Pawsey has been building containers that execute natively on their AMD hardware.
Pawsey will check which containers work, and then @jscgh has been integrating them with the workflow pipeline so we'll ensure we can get Pawsey containers to execute on Pawsey via proteinfold Nextflow
Added RP6000 (CUDA Compute Capability 12.0) tests. No current images are compatible.
Alphafold3 fails in rtx2080 with the error below (just to document it):
File "/alphafold3_venv/lib/python3.11/site-packages/alphafold3/jax/attention/attention.py", line 127, in dot_product_attention
raise ValueError(
ValueError: implementation='triton' is unsupported on this GPU generation.