Han-Chung Wang comments

Results 336 comments of


                                            Han-Chung Wang

Numeric error for quantized mobilebert after LLVM integration

I found that the max error was high like 6.36. According to the comment, it's expected to have high max error because of quantized Softmax issues. https://github.com/iree-org/iree/blob/dbeec8eadabedb442659503bc0a76d87ce7c5069/integrations/tensorflow/test/python/iree_tfl_tests/mobilebert_tf2_quant_test.py#L38-L43 The error is...

Numeric error for quantized mobilebert after LLVM integration

The artifacts can be downloaded from https://storage.googleapis.com/iree-shared-files/nod-perf/hanchung/issue_9796.zip To repro: ``` $ iree-compile --iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot -iree-input-type=tosa ~/mobilebert_quant_tosa.mlir -o /tmp/a.vmfb $ iree-run-module --module_file=/tmp/a.vmfb --device=local-sync --entry_function=main --function_input=@/tmp/mobilebert_quant/download/input0.npy --function_input=@/tmp/mobilebert_quant/download/input1.npy --function_input=@/tmp/mobilebert_quant/download/input2.npy ```

Numeric error for quantized mobilebert after LLVM integration

The error goes down after some integrates... new error range: ``` I0720 01:12:07.200919 140737350333696 test_util.py:94] Max error (0): 6.603168 I0720 01:12:07.201174 140737350333696 test_util.py:94] Max error (1): 8.559662

Missing vectorization for gather ops

Thanks for the input! Totally agree that we should extend `vector.gather` to serve our needs here. RE: adding support for multi-dimensional vectors. It looks like the op already supports multi-dimensional...

Generic ops are not fused and get pulled into the same dispatch

Yes, the backend is able to handle it. This is not working with padding prototype. The padding works with top-down order. In CodegenDriver approach (and current [Sandbox approach](https://github.com/google/iree-llvm-sandbox/blob/807e12acf61aa902ee1987bea800dbdb500875f7/python/examples/fusion/test.py#L47-L71)), we have...

FFT lowering needs to be adapted to the updated workload + workgroup count calculation mode

Yes, I'll help look into it!

FFT lowering needs to be adapted to the updated workload + workgroup count calculation mode

The issue is that some values are offsets during computation. For the example in first stage of FFT, ``` // The last 16 values from CPU output ... 0 2...

FFT lowering needs to be adapted to the updated workload + workgroup count calculation mode

This is the IR before RemoveSingleIterationLoopPass: https://gist.githubusercontent.com/hanhanW/6bb2252e8465ba8f97ee2a34ef2a8dd0/raw The IR looks good to me. I might set configurations wrong... `#iree_codegen.translation_info, workgroup_size = [4 : index, 1 : index, 1 : index]}...

[WIP][DRAFT] Pack operation

> I do suspect that we only need the tensor version of the operation eventually, but we can make that determination later. Yes, I think eventually we only need tensor...

Vectorize dispatch with gather in DeepLabV3

I hit the same issue when profiling deeplabv3 on Pixels. The dispatch dominates the e2e performance in this case (19%). There are two issues: 1. The first level tiling sizes...