proteinfold icon indicating copy to clipboard operation
proteinfold copied to clipboard

Verify container execution on contemporary GPU compute capabilities

Open keiran-rowell-unsw opened this issue 5 months ago • 6 comments

Description of feature

As discussed in dev meeting, in the lead-up to v2 we should verify the multi-hardware compatibility of the current quay.io/nf-core/proteinfold containers. This compatiblity matrix could be useful as a public reference, hopefully most work but some repos are unmaintained (e.g. ESMFold, leading to #212)

The cluster at UNSW is a rolling cluster, so we have access to a range of hardware nodes and both CUDA 11 & 12 versions. My proposal is we verify on as-of-now 'last gen', 'current' and 'new gen' GPUS (Volta, Ampere, Hopper) -- we have all of those on the UNSW cluster. We don't have H100s so will use H200 as proxy (they're the same chip/CC).

Mode GTX2080Ti V100 A100 H100 H200 MI250X RP6000 (compute 12.0)
AlphaFold2 YES(@JoseEspinosa) YES YES YES(@JoseEspinosa) YES -------- NO
Boltz NO(@JoseEspinosa) YES YES YES(@JoseEspinosa) YES -------- NO
ColabFold --------- YES YES --------- YES -------- NO
ESMFold YES(@JoseEspinosa) YES YES YES(@JoseEspinosa) NO https://github.com/nf-core/proteinfold/issues/212 -------- NO
RosettaFold2NA YES(@JoseEspinosa) --------- --------- YES(@JoseEspinosa) --------- -------- --------
RosettaFold-AA YES(@JoseEspinosa) YES YES YES(@JoseEspinosa) YES -------- NO
AlphaFold3 NO(@JoseEspinosa) --------- --------- YES(@JoseEspinosa) -------- -------- --------
HelixFold3 --------- YES YES NO(@JoseEspinosa, #349) NO(@jscgh, #349) -------- NO

keiran-rowell-unsw avatar Jul 30 '25 06:07 keiran-rowell-unsw

Thanks for opening the issue @keiran-rowell-unsw We have H100 at the CRG, so I will give it a try.

JoseEspinosa avatar Jul 30 '25 10:07 JoseEspinosa

I also added RTX 2080, although they are not meant for HPC systems as the others, we had them in the old cluster and now are also accessible in the new one, the exact GPU is GeForce RTX 2080 Ti

JoseEspinosa avatar Jul 30 '25 13:07 JoseEspinosa

I also added RTX 2080, although they are not meant for HPC systems as the others, we had them in the old cluster and now are also accessible in the new one, the exact GPU is GeForce RTX 2080 Ti

Awesome, added! Also brings testing to compute capability 7.5. I can imagine a few labs will have workstation set ups. I have lot of respect for the dollar value of consumer Ti GPUs. I flogged a GTX780Ti to death way back when.

keiran-rowell-unsw avatar Jul 30 '25 21:07 keiran-rowell-unsw

Just had a chat with @SarahBeecroft from Pawsey, 'Australia's friendliest supercomputer'. They have a fleet of AMD MI250X and Sarah at Pawsey has been building containers that execute natively on their AMD hardware.

Pawsey will check which containers work, and then @jscgh has been integrating them with the workflow pipeline so we'll ensure we can get Pawsey containers to execute on Pawsey via proteinfold Nextflow

keiran-rowell-unsw avatar Jul 31 '25 04:07 keiran-rowell-unsw

Added RP6000 (CUDA Compute Capability 12.0) tests. No current images are compatible.

jscgh avatar Oct 08 '25 23:10 jscgh

Alphafold3 fails in rtx2080 with the error below (just to document it):

File "/alphafold3_venv/lib/python3.11/site-packages/alphafold3/jax/attention/attention.py", line 127, in dot_product_attention
      raise ValueError(
  ValueError: implementation='triton' is unsupported on this GPU generation.

JoseEspinosa avatar Nov 12 '25 11:11 JoseEspinosa