Gustaf Ahdritz

Results 170 comments of Gustaf Ahdritz

Here's a datapoint in the meantime. Using the right-out-of-the-box setting from the same commit (c4d9f57), with the real dataloader, the slow cache clearing, DeepSpeed stage 2, CPU offloading, and the...

Hm. I'll try to think of more discrepancies. I think there still have to be more; even if the 6.5-7s A100 time doesn't pan out, we shouldn't be getting essentially...

Sent. Our A100 results were obtained using the following: CUDA Driver 465.19.01 CUDA 11.3 Update 1 (11.3.1.005) cuBLAS 11.5.1.109 (part of CUDA 11.3 U1) CUDNN 8.2.1.32 NCCL 2.9.9 PyTorch 1.9.0a0+c3d40fd...

The mmcif cache isn't required, but the template mmCIFs are. I'll send those over now.

Yes, we have tested bfloat16, and it's a lot better than fp16, but you'll need PyTorch 10 for that. The test I referenced previously used fp16.

You won't NaN anymore. Have you updated your DeepSpeed config for bf16 training?

Hm. Could you test it with DeepSpeed one time? That's what our test used. I'd repeat the test without DeepSpeed myself, but the A100's we've been using are borrowed and...

That's kind of weird. How much memory do you have on your A100s?

Just 700? That's very odd. Is grad being enabled for validation runs or something?