apex
apex copied to clipboard
compiled to a version runs too slow
Have successfully compiled a version of apex on A100 hardware. But when running the test of fmha. It took 21 seconds to finish. What could be the cause?
I don't know your setup but one local run was as follows
root@a33ccb515d34:/opt/pytorch/apex# python apex/contrib/test/fmha/test_fmha.py
Test s=128 b=32, zero_tensors=False
Test s=128 b=32, zero_tensors=True
.Test s=256 b=32, zero_tensors=False
Test s=256 b=32, zero_tensors=True
.Test s=384 b=32, zero_tensors=False
Test s=384 b=32, zero_tensors=True
.Test s=512 b=32, zero_tensors=False
Test s=512 b=32, zero_tensors=True
Test s=512 b=2, zero_tensors=False
Test s=512 b=2, zero_tensors=True
Test s=512 b=3, zero_tensors=False
Test s=512 b=3, zero_tensors=True
.
----------------------------------------------------------------------
Ran 4 tests in 1.637s
OK
Hi, thank you for your reply. I don't understand what configuration you have applied. But under my configuration, I just use CUDA 11.4 and anaconda to successfully build apex from source on A100 and found the result is:
(......) ...@...:~/apex/apex/contrib/test/fmha$ python test_fmha.py
Test s=128 b=32
.Test s=256 b=32
.Test s=384 b=32
.Test s=512 b=32
Test s=512 b=2
Test s=512 b=3
.
----------------------------------------------------------------------
Ran 4 tests in 23.213s
OK
Could you please explain this for me?