tvm
tvm copied to clipboard
[DNNL][BYOC] Enable Altering Dense Weight Layout
The patch including four parts:
- Enable nn.contrib_dense_pack to be partitioned and offloaded by dnnl byoc.
- Enable alter dense layout function during build relay model by introducing nn.contrib_dense_pack.
- Make some minor fixes in tensor_requisite.h.
- Add UT for pack dense.
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.
@apeskov @masahi @yangulei @crazydemo @linlifan @Qianshui-Jiang Please take a look :-)
Thanks @billishyahao, nice patch!
You've touched a common tvm::relay code a little bit to enhance layouts support of packed dense op. There is one delicate nuance here and I would like to highlight it.
"weight_layout" is arbitrary string like "NC", "CN", "CN8c", "CN16n4c" and any others. It should match with regex: (NC|CN)([:digit:](c|n))*
. Will be perfect to have support all of these possible cases.
Let's take a close look on next example. Dense with next shapes: data_shape [128, 10]
, weight_shape [17, 10]
, weight_layout NC
, output_shape will be [128, 17]
. Assume that we applies alter op layout and change layout to NC8c
. Weight shape will be changed to [3, 10, 8]
, with some additional padding. Unexpectedly, output shape will also be changed to [128, 24]
. Weight layout conversion changes output shape, that's very strange behaviour. I know, count
attribute should keep original size of output channels, but it can be None
.
So I recommend you to take into account count
size in "MakeDensePack" implementation and propagate output shape correctly.
Hi @apeskov, what you mentioned is a common issue of blocked layout.
Let's take a close look on next example. Dense with next shapes: data_shape
[128, 10]
, weight_shape[17, 10]
, weight_layoutNC
, output_shape will be[128, 17]
. Assume that we applies alter op layout and change layout toNC8c
. Weight shape will be changed to[3, 10, 8]
, with some additional padding. Unexpectedly, output shape will also be changed to[128, 24]
. Weight layout conversion changes output shape, that's very strange behavior.
If additional padding
applied when transformed from a plain layout to blocked layout, a cropping
must be applied too when transformed back to plain layout. A bijective transformation should ensure origin = backward(forward(origin))
, but it's not guarantied so far.
Padding is natural, while cropping
needs extra information
. We use the extra information from the definition of Conv
to solve this problem with the blocked weights, but it's a workaround instead of a general solution. I think we need both logical shape
and concrete shape
for tensor, just like the dims
and padded dims
in DNNL memory descriptor.
Maybe we need a Pass to Infer the original logical shapes and save them as attributes for later usage, do you have any idea about this?
Hi @masahi , Could you shed some light on the ci failure? I could not find a way to reproduce it on local environment. Thanks!
Sorry I couldn't tell what the error was either, cc @driazati @areusch
sorry it's unclear from the logs, we really should aggregate common error phrases automatically. If you search through the logs for Fatal Python error: Aborted
you can see the failed test (e.g. in https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-11966/runs/10/nodes/384/steps/1167/log/?start=0 it's
[2022-08-01T04:22:42.467Z] tests/python/frontend/pytorch/test_forward.py::test_convert_torch_script_with_input_types free(): invalid pointer
[2022-08-01T04:22:42.467Z] Fatal Python error: Aborted
[2022-08-01T04:22:42.467Z]
[2022-08-01T04:22:42.467Z] Thread 0x00007fbdec1c2700 (most recent call first):
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 300 in wait
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 552 in wait
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/tqdm/_monitor.py", line 60 in run
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
[2022-08-01T04:22:42.467Z]
[2022-08-01T04:22:42.467Z] Thread 0x00007fbe37727700 (most recent call first):
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/socket.py", line 212 in accept
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 429 in run_server
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 870 in run
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
[2022-08-01T04:22:42.467Z]
[2022-08-01T04:22:42.467Z] Current thread 0x00007fbe6c265740 (most recent call first):
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/torch/jit/_serialization.py", line 162 in load
[2022-08-01T04:22:42.467Z] File "/workspace/tests/python/frontend/pytorch/test_forward.py", line 4077 in test_convert_torch_script_with_input_types
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 1761 in runtest
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 166 in pytest_runtest_call
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in <lambda>
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 338 in from_call
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in call_runtest_hook
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 219 in call_and_report
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 130 in runtestprotocol
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 497 in pytest_runtest_protocol
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 347 in pytest_runtestloop
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 322 in _main
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 268 in wrap_session
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 315 in pytest_cmdline_main
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/config/__init__.py", line 165 in main
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/_pytest/config/__init__.py", line 187 in console_main
[2022-08-01T04:22:42.467Z] File "/usr/local/lib/python3.7/dist-packages/pytest/__main__.py", line 5 in <module>
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/runpy.py", line 85 in _run_code
[2022-08-01T04:22:42.467Z] File "/usr/lib/python3.7/runpy.py", line 193 in _run_module_as_main
[2022-08-01T04:22:42.722Z] tests/scripts/setup-pytest-env.sh: line 49: 28733 Aborted TVM_FFI=${ffi_type} python3 -m pytest -o "junit_suite_name=${suite_name}" "--junit-xml=${TVM_PYTEST_RESULT_DIR}/${suite_name}.xml" "--junit-prefix=${ffi_type}" "${extra_args[@]}"
[2022-08-01T04:22:42.722Z] + exit_code=134
)
That surely looks unrelated to this PR (it fails in PyTorch). The same issue is reported in https://github.com/apache/tvm/issues/12276. The error looks similar to the one in https://github.com/apache/tvm/issues/9362, but I don't know why we start getting this now...
@tvm-bot rerun
Hi @masahi , Thanks for quick response. I found another testcase failed in https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-11966/runs/11/nodes/382/steps/1159/log/?start=0. [2022-08-03T01:24:59.821Z] tests/python/frontend/pytorch/qnn_test.py::test_serialized_modules free(): invalid pointer [2022-08-03T01:24:59.821Z] Fatal Python error: Aborted
Is it a random issue?
Yeah that looks also unrelated and flaky. It's strange, I did a PR yesterday and met none of these issues. https://github.com/apache/tvm/pull/12263
@tvm-bot rerun