iree Correct error tolerances for golden values on A100

Issue body

After #9975 error tolerances for shark on A100 have been exceeded for a few models. Here are some numbers:

self = <xlm-roberta-base_tf_test.XLMRobertaModuleTester object at 0x7fe155272770>, dynamic = False, device = 'gpu'                                                                                                
E       Mismatched elements: 252531 / 4000032 (6.31%)
E       Max absolute difference: 0.10531139
E       Max relative difference: 865.5326
E        x: array([[[-4.401309, -0.024628, -7.125814, ...,  4.503648, -4.59222 ,                                                                                                                                  E                -1.076694],
E               [-2.240759,  0.21017 , -8.47337 , ..., -2.105228, -1.818338,...
E        y: array([[[-4.394317, -0.024287, -7.125648, ...,  4.478123, -4.585316,
E                -1.076995],
E               [-2.24629 ,  0.210042, -8.475931, ..., -2.110665, -1.816817,...

self = <roberta-base_tf_test.RobertaBaseModuleTester object at 0x7fe155308f70>, dynamic = False, device = 'gpu'
E       Not equal to tolerance rtol=0.01, atol=0.001
E
E       Mismatched elements: 46624 / 804240 (5.8%)
E       Max absolute difference: 0.04533577
E       Max relative difference: 763.70135
E        x: array([[[33.55235 , -3.827327, 18.863625, ...,  3.420343,  6.171632,                                                                                                                                  E                11.648125],
E               [-0.598835, -4.141003, 14.904708, ..., -4.515923, -1.790529,...
E        y: array([[[33.567413, -3.829913, 18.870962, ...,  3.422938,  6.174327,
E                11.656706],
E               [-0.58585 , -4.141752, 14.913631, ..., -4.516505, -1.788759,...

self = <mpnet-base_tf_test.MpNetModuleTester object at 0x7fe15525a080>, dynamic = False, device = 'gpu' 
E       AssertionError:
E       Not equal to tolerance rtol=0.01, atol=0.001
E
E       Mismatched elements: 59378 / 488432 (12.2%)
E       Max absolute difference: 0.06668603                                                                                                                                                                       E       Max relative difference: 3304.545                                                                                                                                                                         E        x: array([[[40.389954,  4.286406, 23.76233 , ..., -1.074989, -0.482307,                                                                                                                                  E                16.880697],                                                                                                                                                                                      E               [ 2.257942,  0.504233,  8.199037, ..., -1.836042,  0.471555,...
E        y: array([[[40.38317 ,  4.290402, 23.760578, ..., -1.071989, -0.476889,                                                                                                                                  E                16.869303],
E               [ 2.256348,  0.50376 ,  8.193238, ..., -1.834158,  0.474861,...


self = <mobilebert-uncased_tf_test.MobileBertModuleTester object at 0x7fe15525b340>, dynamic = False, device = 'gpu'  
E       AssertionError:
E       Not equal to tolerance rtol=0.01, atol=0.001
E
E       Mismatched elements: 99072 / 488352 (20.3%)
E       Max absolute difference: 0.2849064
E       Max relative difference: 570.024
E        x: array([[[-4.563648, -8.917149, -9.508633, ..., -8.859805, -9.35775 ,
E                -3.739411],
E               [-8.470783, -8.042081, -7.747127, ..., -7.734895, -8.48076 ,...
E        y: array([[[-4.561851, -8.916107, -9.508212, ..., -8.85981 , -9.357377,                                                                                                                                  E                -3.738622],
E               [-8.470868, -8.044136, -7.747543, ..., -7.735366, -8.477797,...

self = <layoutlm-base-uncased_tf_test.LayoutLMModuleTester object at 0x7fe1552b6ec0>, dynamic = False, device = 'gpu'
E       AssertionError:
E       Not equal to tolerance rtol=0.01, atol=0.001
E
E       Mismatched elements: 145352 / 488352 (29.8%)
E       Max absolute difference: 0.0553565
E       Max relative difference: 2522.9336
E        x: array([[[-0.424161,  1.658019,  0.9119  , ...,  0.691548,  0.414469,
E                 0.90081 ],
E               [-0.761064, -0.302433, -1.195132, ..., -0.884939,  0.444821,...
E        y: array([[[-0.41647 ,  1.662008,  0.920087, ...,  0.697769,  0.41865 ,                                                                                                                                  E                 0.905736],
E               [-0.751545, -0.297387, -1.189691, ..., -0.874244,  0.443483,...

self = <electra-small-discriminator_tf_test.ElectraModuleTester object at 0x7fe15525abc0>, dynamic = False, device = 'gpu'
E       AssertionError:
E       Not equal to tolerance rtol=0.01, atol=0.001
E
E       Mismatched elements: 58884 / 488352 (12.1%)
E       Max absolute difference: 0.01450959
E       Max relative difference: 756.9981
E        x: array([[[ 1.150137e+00,  1.647311e-01,  1.618423e-01, ...,
E                 1.635987e-01,  1.645508e-01,  1.536248e-01],
E               [-2.518778e-02,  2.517256e-01,  2.526046e-01, ...,...
E        y: array([[[ 1.151324,  0.167032,  0.164161, ...,  0.165891,  0.166873,                                                                                                                                  E                 0.155909],
E               [-0.027828,  0.250528,  0.251408, ...,  0.253964,  0.253663,...

Is this expected? Should we revise the tolerances?

Aug 09 '22 19:08 dan-garvey

Can we disable TF32 for both IREE and and TF (if cuda backend is used as reference) and see if this goes away. I don't think there is much we can do if we want to use TF32.

Aug 09 '22 20:08 ThomasRaoux

@ThomasRaoux @KoolJBlack didn't get a chance to review this in today's sync. Do we have a priority for this?

Aug 11 '22 18:08 allieculp

This was discussed on the chat. There is not much we can do within IREE as long as we want to use TF32

Aug 11 '22 18:08 ThomasRaoux

@dan-garvey can you update the bug with your plan?

Aug 11 '22 18:08 ThomasRaoux

@dan-garvey Can you provide an update today?

Aug 15 '22 19:08 allieculp

Yeah, we're going to relax tolerance when TF32 is enabled. Any changes would be on the NVIDIA side, so nothing required from the IREE side afaik. Thanks for the support!

Aug 15 '22 19:08 dan-garvey

Great! Can we close this?

Aug 15 '22 20:08 allieculp