Vincent Wang
Vincent Wang
Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for forward and 5 Ops for backward. The PR is to fuse this to a single Op named QuickGelu and its gradient...
This PR is to fix https://github.com/microsoft/onnxruntime/issues/12930 and https://github.com/microsoft/onnxruntime/issues/12579. In detail: - For CPU EP, since current impl of SimplifiedLayerNormalization doesn't support input and scale having different data types, so if...
#19218 tried to fuse Gather/Slice to Split, but the logic has problem. Scalar value or 1-dim value of indices in Gather node will produce different result, scalar value will produce...