Han-Chung Wang
Han-Chung Wang
I found that the max error was high like 6.36. According to the comment, it's expected to have high max error because of quantized Softmax issues. https://github.com/iree-org/iree/blob/dbeec8eadabedb442659503bc0a76d87ce7c5069/integrations/tensorflow/test/python/iree_tfl_tests/mobilebert_tf2_quant_test.py#L38-L43 The error is...
The artifacts can be downloaded from https://storage.googleapis.com/iree-shared-files/nod-perf/hanchung/issue_9796.zip To repro: ``` $ iree-compile --iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=dylib-llvm-aot -iree-input-type=tosa ~/mobilebert_quant_tosa.mlir -o /tmp/a.vmfb $ iree-run-module --module_file=/tmp/a.vmfb --device=local-sync --entry_function=main --function_input=@/tmp/mobilebert_quant/download/input0.npy --function_input=@/tmp/mobilebert_quant/download/input1.npy --function_input=@/tmp/mobilebert_quant/download/input2.npy ```
The error goes down after some integrates... new error range: ``` I0720 01:12:07.200919 140737350333696 test_util.py:94] Max error (0): 6.603168 I0720 01:12:07.201174 140737350333696 test_util.py:94] Max error (1): 8.559662
Thanks for the input! Totally agree that we should extend `vector.gather` to serve our needs here. RE: adding support for multi-dimensional vectors. It looks like the op already supports multi-dimensional...
Yes, the backend is able to handle it. This is not working with padding prototype. The padding works with top-down order. In CodegenDriver approach (and current [Sandbox approach](https://github.com/google/iree-llvm-sandbox/blob/807e12acf61aa902ee1987bea800dbdb500875f7/python/examples/fusion/test.py#L47-L71)), we have...
Yes, I'll help look into it!
The issue is that some values are offsets during computation. For the example in first stage of FFT, ``` // The last 16 values from CPU output ... 0 2...
This is the IR before RemoveSingleIterationLoopPass: https://gist.githubusercontent.com/hanhanW/6bb2252e8465ba8f97ee2a34ef2a8dd0/raw The IR looks good to me. I might set configurations wrong... `#iree_codegen.translation_info, workgroup_size = [4 : index, 1 : index, 1 : index]}...
> I do suspect that we only need the tensor version of the operation eventually, but we can make that determination later. Yes, I think eventually we only need tensor...
I hit the same issue when profiling deeplabv3 on Pixels. The dispatch dominates the e2e performance in this case (19%). There are two issues: 1. The first level tiling sizes...