borg323 comments

Results 4 comments of


                                            borg323

Not all CUDA operators support bfloat16 that should

This was a theoretical exercise, we haven't run anything with bfloat16 yet, but we are only missing `ReduceMean`. This was based on https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md#cudaexecutionprovider that shows > T = tensor(double), tensor(float),...

Not all CUDA operators support bfloat16 that should

~~We also have an alternative code path that uses `Exp` and `Greater`. It seems `Exp` is also supported in the code and similarly not documented, but `Greater` is not. This...

Not all CUDA operators support bfloat16 that should

Looking through the code, it seems at least `ReduceMean`, `ReduceProd`, `ReduceLogSum`, `ReduceSumSquare`, `ReduceSumExp`, `ReduceL1`, `ReduceL2`, `Abs`, `Sign`, `Exp`, `Greater`, `GreaterOrEqual`, `Less`, `LessOrEqual`, `Equal`, `LeakyRelu` and `PRelu` support bfloat16 without this...

Invalid EP results in bad binpack conversion

I have fixed the code that resulted in bad EP fields, so this isn't urgent for me.