Add option to predict larger structures using CUDA bfloat16

Open tomgoddard opened this issue 8 months ago • 0 comments

A major limitation of Boltz is that it uses a lot of GPU memory compared say to AlphaFold 3 so predictions run out of memory. This is especially problematic on consumer GPUs that most biology researchers have in their lab computers. On an Nvidia 4090 GPU with 24 GB memory the maximum prediction size is about 1000 residues. Using 16-bit floating point model weights and activations (bfloat16) with CUDA allows it to predict about 1400 residues. In testing on a half dozen PDB structures I have not seen any reduction in accuracy. The bfloat16 type has the same number of exponent bits (8) as float32 but has only 8 bit mantissa versus 24 for float32. So the dynamic range of bfloat16 is the same as float32 but with fewer significant digits. The code change is almost trivial, adding one line to Boltz

with torch.autocast(device_type='cuda', dtype=torch.bfloat16):
    trainer.predict(...)

I'll make a pull request to use bfloat16 if the option "--use_cuda_bfloat16" is given. The default will of course be not to use it.I'm working on Boltz prediction in the ChimeraX molecular visualization program for release in early June that runs Boltz on user's local computer (Mac, Linux, Windows) and would like to be able to take advantage of this option.

In my testing bfloat16 is not usable on non-CUDA platforms so I think it only makes sense to support bfloat16 with CUDA at this time. I tried bfloat16 on Mac with Torch Metal Performance Shaders (MPS) -- performance was poor, about 20% slower with only 10-20% less memory use. On Intel CPU performance with bfloat16 was horrible, about 6 times slower than float32 on an i7-12700K. More details are in this ChimeraX ticket: https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/17555.

May 12 '25 23:05 tomgoddard