Paul Fultz II
Paul Fultz II
> Make sure MIGraphX parses those models such correctly, recogizes the patterns and insert "Pack" after the "clip" to make it Packed Int4 weight The "Clip" operator is just for...
> let's say Client is using same fake-quantized int4 model This is very unlikely. A fake-quantized model implies that the weights can be computed with a simple scale and shift...
> Need a way to remove "pack" and "unpack" though for the "Ref" run. Why? It will still run with those operators in there.
A couple more tasks that need to be addressed with onnx support: - [ ] Support signed integers in pack/unpack, solved by https://github.com/ROCm/AMDMIGraphX/pull/3359 - [ ] Add clipping to pack...
> That clip would still work in int8, however. quantizelinear already does clipping, so it will clip it for int8 and then we just need to update pack to clip...
The task still needed are: - [x] Enable fusing unpack_int4 and dequantizelinear operators on the weights with mlir. - [ ] Improve constant propagation so it doesnt convert unpack_int4 or...
To get constant propagation working, I think we can just skip over aliases(and reshape which is almost an alias): ```cpp bool skip_propagate(instruction_ref ins) { if(contains({"contiguous", "dequantizelinear", "reshape"})) return skip_propagate(ins->inputs().front()); auto...
This should be done in the symboldatabase either before or during exprids.
Tokens can have more than one known value, but its not very common.
In the future, I would like to introduce dynamic attributes as some of these attributes are only used for some rare occasions, and it doesnt make sense to always have...