Paul Fultz II comments

Results 392 comments of


                                            Paul Fultz II

[INT4] Compress model by quantizing weights to int4

> Make sure MIGraphX parses those models such correctly, recogizes the patterns and insert "Pack" after the "clip" to make it Packed Int4 weight The "Clip" operator is just for...

[INT4] Compress model by quantizing weights to int4

> let's say Client is using same fake-quantized int4 model This is very unlikely. A fake-quantized model implies that the weights can be computed with a simple scale and shift...

[INT4] Compress model by quantizing weights to int4

> Need a way to remove "pack" and "unpack" though for the "Ref" run. Why? It will still run with those operators in there.

[INT4] Compress model by quantizing weights to int4

A couple more tasks that need to be addressed with onnx support: - [ ] Support signed integers in pack/unpack, solved by https://github.com/ROCm/AMDMIGraphX/pull/3359 - [ ] Add clipping to pack...

[INT4] Compress model by quantizing weights to int4

> That clip would still work in int8, however. quantizelinear already does clipping, so it will clip it for int8 and then we just need to update pack to clip...

[INT4] Compress model by quantizing weights to int4

The task still needed are: - [x] Enable fusing unpack_int4 and dequantizelinear operators on the weights with mlir. - [ ] Improve constant propagation so it doesnt convert unpack_int4 or...

[INT4] Compress model by quantizing weights to int4

To get constant propagation working, I think we can just skip over aliases(and reshape which is almost an alias): ```cpp bool skip_propagate(instruction_ref ins) { if(contains({"contiguous", "dequantizelinear", "reshape"})) return skip_propagate(ins->inputs().front()); auto...

Paul Fultz II

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

[INT4] Compress model by quantizing weights to int4

introduced a cache for `followAllReferences()` calls

programmemory.cpp: avoid repeated iteration over values in `Executor::executeImpl()`

reduced size of `ValueFlow::Value`