Ted Themistokleous
Ted Themistokleous
Updated this to be a separate pass. Seeing some odd behavior on the added tests when this gets converted to kernels on the gpu. Need to sort out that last...
There appears to have been an bug in how we were parsing which was causing the failures in the verify tests when that datatype is moved from uint8_t to int8_t....
Updated things here for the failing UTs, turns out if it was flagging overflow with dot (y,y) with some test cases. Alternatively, It looks like I can't get this to...
@pfultz2 you were right I needed to use insert here instead of add along with looking at the quantizelinear as well as part of the update. Removed any changes to...
Split out changes for parse_dynamcquantizelinear for this one. Retargeted branch to that PR until that changeset gets in. Will revisit one that's merged
DQL fix + Pass changes ``` Summary: gpu::code_object::reduce_min_min_kernel: 2.04166ms / 49 = 0.0416666ms, 13% gpu::code_object::reduce_max_max_sub_mul_kernel: 2.03947ms / 49 = 0.0416219ms, 13% gpu::code_object::mul_quantizelinear_kernel: 1.66411ms / 48 = 0.0346689ms, 10% gpu::code_object::mlir_quant_dot: 1.35967ms...
Not required but can still be used as perf improvement now 2903 has been added. Will need to determine speedup after rebase. Fixes were pulled out of here for some...
- [ ] Handle and remove shift from MatMulinteger Parser - [ ] Handle and remove shift from ConvInteger Parser - [ ] Handle and remove shift from DynamicQuantizeLinear parser...
Relevant doc/spec - https://github.com/onnx/onnx/blob/main/docs/Operators.md#softmaxcrossentropyloss