Tensor cores or cuda cores?
When setting up a new PC to run AlphaFold locally, should I look for a GPU with more tensor cores or cuda cores? For example, are we better off going with the Nvidia Quadro or GeForce line? Can anyone recommend a sub $5K GPU that would be the best "bang for the buck"?
Also - now that the Amber relaxation step can use GPU instead of CPU, what is the recommended CPU configuration?
we use a RTX3090 and are very happy with it. An important consideration is RAM which is the limiting factor for the size of the monomer or complex. 24GB is the most we could find for sub-5k€.
Thank you! We're wondering whether a 24 GB GeForce 3090 (or 3090 ti) or a 24 GB Quadro RTX 6000 is the way to go. If you don't mind me asking, do you need a lot of CPU power alongside the 3090?
Yes (and for the RTX 6000 it would be the same requirement). We are using a Threadripper PRO 3975WX with 32 physical cores, 256 GB RAM, the database on NVME disks, and the patch https://github.com/deepmind/alphafold/pull/399 . The wallclock time for the T1050 job is 1 hour 6 minutes. More than 128 GB is not needed, I think, and 24 logical cores should be ok. The system can be used up to almost 2000 residues.
Thank you so much!
By the way, do you experience any overheating issues with the 3090. We're wondering if the built-in fans of the 3090 and standard desktop PC fans will be sufficient or if we'll need to increase the cooling capacity of the computer.
Ours is not a standard desktop PC but rather a Threadripper workstation. It is well cooled and we have not seen a problem.
Thanks!
Tensor cores are important when you train, here cuda is more actively used.
But need to check under debugger with scheduler support. Do not forget amber uses CUDA too.
When setting up a new PC to run AlphaFold locally, should I look for a GPU with more tensor cores or cuda cores? For example, are we better off going with the Nvidia Quadro or GeForce line? Can anyone recommend a sub $5K GPU that would be the best "bang for the buck"?
Also - now that the Amber relaxation step can use GPU instead of CPU, what is the recommended CPU configuration?
-
Best performing video card. The best performing video card is the RTX A5000. I have two A5000 and one A6000 in my workstation. Despite the a6000 being 2x the price of the a5000, the execution times are very similar on sequences of up to ~2000 residues (this is the total length of the concatenate in case of multimers). Longer sequences require more than 24GB of video ram and a6000 is faster in those cases (48GB). However, 1) the accuracy of domain placement on sequences longer than ~2000 residues is often low, so there is not much sense to run long sequences routinely - the entire body of results consisting of longer sequences will be questionable; 2) the current implementation of UVM is pretty good and GPU/CPU utilize the entire memory pool very efficiently. So, you can run really long sequences on the a5000, but it will take longer than on the a6000. A run that would take 1 day on the a6000 will take 1.5 (maybe 2) days on the a5000. For specific, targeted jobs on longer sequences this is acceptable. Obviously, not acceptable for routine 24x7 runs as the differences in the execution times will accumulate.
-
Amber GPU-accelerated relaxation of huge structures takes a negligible amount of time in the entire AF run (e.g. 70 seconds for not very well folded 2000+ residue oligomer on the A5000). For small-to-medium structures, this step is almost instantaneous. So, GPU is the way to go.
Petr
Hi @Phage-structure-geek would you be willing to share with me the specifications for your A6000 workstation? We did build a PC with a 3090Ti, but now we want to build one that can potentially model higher order multimeric complexes. I was mostly wondering what motherboard, CPU, RAM cards, CPU cooler and case you use. Thanks in advance!
Hi,
Here is my configuration:
Component: Model - Quantity GPU: PNY Technologies RTX A5000 Graphics Card (24 GB) - 2 GPU: PNY Technologies RTX A6000 Graphics Card (48 GB) - 1 Motherboard: ASUS Pro WS WRX80E-Sage SE WIFI sWRX8 E-ATX - 1 CPU: AMD Ryzen Threadripper PRO 3975WX 3.5 GHz 32-Core sWRX8 - 1 CPU: cooler Floe Riing RGB 360 TR4 Edition - 1 Memory: Samsung - DDR4 - module - 64 GB - DIMM 288-pin - 3200 MHz / PC4-25600 - registered (ECC) - 8 Case: Fractal Design Meshify 2 XL Full-Tower Case (Black, Light Tint Tempered Glass) - 1 Power supply: be quiet! Dark Power Pro 12 1500W 80 PLUS Titanium Modular Power Supply- 1 System and scratch SSDs: Samsung 2TB 980 PRO PCIe 4.0 x4 M.2 - 3 Database SSD (RAID 0): Samsung 1TB 980 PRO PCIe 4.0 x4 M.2 - 4 Data drives HDD: WD 10TB Ultrastar 7200 rpm SATA 3.5" Internal Data Center HDD - 4 Monitor: Samsung U28R55 28" 16:9 4K HDR FreeSync IPS Monitor - 1 Keyboard and mouse: Verbatim Slimline Corded USB Keyboard and Mouse (Black) - 1
I wanted this system to be headless originally, but I run it from the console now. The display is connected to the A6000. The X-server allocates 400 MB of the A6000 GPU memory to itself.
AF and its databases and a few other things that use the same databases are on the 4-SSD RAID 0 array. The HDDs are used to store cryoEM images as we use it for cryoEM image processing also.
In my tests, the A6000 runtimes are equal to those of the A5000s when the model fits into the A5000 memory entirely. For large structures, where the model does fit into the A5000 GPU memory, the A5000 is only 20-50% (or something of that sort) slower than the A6000. As I understand, the 3090 Ti is faster than the A5000, so it might be already running similar to the A6000 on big structures.
Now, the A6000 Ada is much faster than the A6000. Much faster. But it is runs hotter and requires 450 W.
Petr
Edited to fix the components list formatting...
On Jan 17, 2023, at 1:34 PM, emirzakh @.*** @.***>> wrote:
Hi @Phage-structure-geek https://github.com/Phage-structure-geek would you be willing to share with me the specifications for your A6000 workstation? We did build a PC with a 3090Ti, but now we want to build one that can potentially model higher order multimeric complexes. I was mostly wondering what motherboard, CPU, RAM cards, CPU cooler and case you use. Thanks in advance!
— Reply to this email directly, view it on GitHub https://github.com/deepmind/alphafold/issues/485#issuecomment-1385946572, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY36L5ZCXJPW4XGRYA5YVSLWS3X5LANCNFSM5XHG2LIQ. You are receiving this because you were mentioned.
@Phage-structure-geek Thank you very much! Do you ever experience overheating issues with the PNY GPUs or have temperatures stayed at acceptable ranges for you? The 3090Ti stays between 50-60C for us, but it comes with 3 cooling fans. Thank you for the heads up regarding the Ada! I will see if my PI wants to hold back on purchasing the A6000 if the Ada version is coming out in the near future.
On Jan 17, 2023, at 6:15 PM, emirzakh @.***> wrote:
@Phage-structure-geek https://github.com/Phage-structure-geek Thank you very much! Do you ever experience overheating issues with the PNY GPUs or have temperatures stayed at acceptable ranges for you?
The three cards sit next to each other with the two A5000 being linked by a NVlink bridge. The A6000 is the top one and it gets to 87C (at times) when all three are running. This is an acceptable temperature for it. The A5000s top at 80C.
The 3090Ti stays between 50-60C for us, but it comes with 3 cooling fans. Thank you for the heads up regarding the Ada! I will see if my PI wants to hold back on purchasing the A6000 if the Ada version is coming out in the near future.
That’s a good question - when those Adas will be become available... Another one is how hot they will get when running in a typical vertical stack orientation. I mean would it be even possible to run two Adas on top of each other when they put out 450W of heat each? Or this card is not supposed to be stacked?
Petr
Hey @Phage-structure-geek, and other folks on this thread. I like what was written on this thread, and recently I have been trying to price up a setup that will run AF2. Keeping in mind the above setup that was recommended, I went on line and found perhaps more recent hardware. If any of you gets a chance, can you take a look and let me know your thoughts:
2 GPUs: PNY Technologies RTX A6000 Graphics Card (24 GB) , (cost: 2 * $2500.00) 1 GPU: PNY Technologies RTX A6000 Graphics Card (48 GB) , (cost: $4600.00) 1 Motherboard: ASUS Pro WS WRX80E-Sage SE WIFI sWRX8 E-ATX (cost: $1000.00) 1 CPU: AMD Ryzen Threadripper 3990X 64-Core, 128-Thread Unlocked Desktop Processor (cost: $4255.00) 1 CPU cooler: Floe Riing RGB 360 TR4 Edition (cost: $200.00) 8 Memory: Samsung - DDR4 - module - 64 GB - DIMM 288-pin - 3200 MHz / PC4-25600 – reg (cost: 8*$200.00) Case: Fractal Design Meshify 2 XL Full-Tower Case (Black, Light Tint Tempered Glass) (cost: $250.00) Power supply: Dark Power Pro 13 1600W 80 PLUS Titanium Modular Power Supply (cost: $450.00) 2 system and scratch SSDs: Samsung 4TB 990 PRO PCIe 4.0 x4 NVMe 2.0c - M.2 (cost: 2*$350.00) 2 database SSDs (RAID 1): Samsung 4TB 990 PRO PCIe 4.0 x4 M.2 (for the MSA data set) (cost: 2*$350.00) 4 data drives HDD: WD 10TB Ultrastar 7200 rpm SATA 3.5" Internal Data Center HDD (for results) (cost: 4*$300.00) TOTAL COST: $19,955
Hi @adalal78,
Your configuration looks great. I do not think you need RAID 1 for the databases. RAID 1 is slower than no RAID (seek time is much longer) and you do not need redundancy in this case because these databases stay unchanged for many months if not years (you can update them from time to time). Use the second SSD for something else. Make a backup of the databases to a HDD and let them be. Note, that shorter sequences spend a lot of time on the jackhhmer stage, which runs on 4 CPU threads at most. The subsequent 3D prediction and minimization stages are very fast. Longer sequences also spend some time on the jackhhmer stage (also ~4 threads) and then spend a lot of time on the 3D prediction stage, which uses only one CPU thread and all of the GPU (obviously).
Are you going to do some AI development work on this computer?
The RTX 6000 Ada is available now. It is much faster than the RTX A6000 with the same power requirements and thermals. It is much more expensive...
Petr
That loadout looks like it should run very well, though there are a few points that seem like they have more than AF2 is likely to really leverage. I would lean towards the 1GPU option (unless you meant all 3 for some monstrous protein complexes?)
I generally run AF2 on an AWS g5.4xlarge, which has 16 vCPUs, and the load average is generally under 3 with that setup. I don't even see much slowdown when running bzip2 at the same time as AF2. 128 threads seems totally overkill, and it looks like that is ~20% of your cost.
Unless you plan on running monstrously large proteins/protein complexes, you probably don't need over 500GB RAM, so if you want to cut costs a bit further you could run with 2 or 4 instead of 8 memory sticks.
Hi @adalal78,
Your configuration looks great. I do not think you need RAID 1 for the databases. RAID 1 is slower than no RAID (seek time is much longer) and you do not need redundancy in this case because these databases stay unchanged for many months if not years (you can update them from time to time). Use the second SSD for something else. Make a backup of the databases to a HDD and let them be.
Ah, so I was thinking to use the second SSD for the database backup, but I think going down the route of backing it up on the HDD AND then I can just get 1 4TB SSD for having this database that the AF2 needs.
Note, that shorter sequences spend a lot of time on the jackhhmer stage, which runs on 4 CPU threads at most. The subsequent 3D prediction and minimization stages are very fast. Longer sequences also spend some time on the jackhhmer stage (also ~4 threads) and then spend a lot of time on the 3D prediction stage, which uses only one CPU thread and all of the GPU (obviously).
Good to know, and thank you for sharing. So, along with what @tcoates5 wrote, I think I can think about something much lighter than the 128 thread option. Recommendations welcome...
Are you going to do some AI development work on this computer?
I had planned to use it to do some AI dev, but I think that using the AWS is probably a better idea? This is something I really struggle with, AWS vs. some other option?????
The RTX 6000 Ada is available now. It is much faster than the RTX A6000 with the same power requirements and thermals. It is much more expensive...
Thank you again, will look into this...
Petr
That loadout looks like it should run very well, though there are a few points that seem like they have more than AF2 is likely to really leverage. I would lean towards the 1GPU option (unless you meant all 3 for some monstrous protein complexes?)
I agree with the GPU rec. here and thank you!
I generally run AF2 on an AWS g5.4xlarge, which has 16 vCPUs, and the load average is generally under 3 with that setup. I don't even see much slowdown when running bzip2 at the same time as AF2. 128 threads seems totally overkill, and it looks like that is ~20% of your cost.
If you don't mind sharing, is there a good tutorial on running AF2 on AWS? I did find this one: https://medium.com/proteinqure/alphafold-quickstart-on-aws-9ba20692c98e If you can share your thoughts on this @tcoates5, I would be very thankful...
Unless you plan on running monstrously large proteins/protein complexes, you probably don't need over 500GB RAM, so if you want to cut costs a bit further you could run with 2 or 4 instead of 8 memory sticks.
Agreed on this as well. This would depend on the use case...
@tcoates5 and anyone else who would like to chime in, I'm deciding between the following CPUs
AMD Ryzen Threadripper PRO 5955WX, 16-core, 32-Thread Desktop Processor (cost: $1050.00) or AMD Ryzen Threadripper 3960X 24-Core, 48-Thread Unlocked Desktop Processor (cost: $1300.00)
After considering all that you have written, I think I can get away with 16-core, 32-thread option, or even less?
Are both CPUs supported by ASUS Pro WS WRX80E? The Threadripper 39x and 59x is socket sTRX4, while the Threadripper Pro 39x and 59x is sWRX8, the socket on that MB. Can the Threadripper 3960X support the amount of ECC memory you will have on the MB?
@Phage-structure-geek brings up good points. As long as you have at least 16 threads, more threads won't generally make much of a difference. Depending a bit on the sequences you run, even having only 8 threads may give a pretty similar runtime. However, you do need to make sure it is compatible with the rest of the hardware, and getting a CPU with a faster clock speed will make a significant difference.
I think you want to have 16 threads per GPU. Although 8 or 12 will work just fine and will be almost as fast as 16 for most jobs. So, for a 3 GPU workstation, you probably want a 36-48 thread CPU.