Mutable buffer fails to lower to QNN backend
When I lower this toy example to QNN backend, the linear operator cause an error.
class MutableStateModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("state", torch.tensor(torch.zeros(1,1)))
def forward(self, x):
self.state.add_(torch.ones(1,1))
return x @ self.state.T
Here is message:
Traceback (most recent call last):
File "/home/lei/llama_mobile/python/android/test2.py", line 89, in <module>
build_executorch_binary(
File "/home/lei/llama_mobile/python/android/utils.py", line 215, in build_executorch_binary
edge_prog.exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)
File "/home/lei/miniconda3/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/home/lei/llama_mobile/executorch/exir/backend/backend_api.py", line 363, in _
partitioner_result = partitioner_instance(fake_edge_program)
File "/home/lei/llama_mobile/executorch/exir/backend/partitioner.py", line 64, in __call__
return self.partition(exported_program)
File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 138, in partition
partitions = self.generate_partitions(edge_program)
File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 124, in generate_partitions
return generate_partitions_from_list_of_nodes(
File "/home/lei/llama_mobile/executorch/exir/backend/canonical_partitioners/pattern_op_partitioner.py", line 50, in generate_partitions_from_list_of_nodes
partition_list = capability_partitioner.propose_partitions()
File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 201, in propose_partitions
if self.__is_node_supported(node) and node not in assignment:
File "/home/lei/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/partitioner.py", line 79, in __is_node_supported
self.operator_support.is_node_supported(dict(self.graph_module.named_modules()), node)
File "/home/lei/llama_mobile/executorch/backends/qualcomm/partition/qnn_partitioner.py", line 79, in is_node_supported
op_wrapper = self.node_visitors[node.target.__name__].define_node(
File "/home/lei/llama_mobile/executorch/backends/qualcomm/builders/op_linear.py", line 52, in define_node
weight_tensor_wrapper = self.define_tensor(
File "/home/lei/llama_mobile/executorch/backends/qualcomm/builders/node_visitor.py", line 301, in define_tensor
dims = [1] if len(tensor.size()) == 0 else tensor.size()
AttributeError: 'NoneType' object has no attribute 'size'
If I replace linear operation x @ self.state.T with addtion operation x + self.state.T, it will work.
@cccclai For QNN Lowering
what does the graph look like for this toy model?
The graph looks like this:
def forward(self, b_state, x):
aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False)
aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(b_state, aten_full_default); b_state = aten_full_default = None
aten_linear_default = executorch_exir_dialects_edge__ops_aten_linear_default(x, aten_add_tensor); x = None
return (aten_add_tensor, aten_linear_default)
Is it after torch.export or to_edge? Mind sharing the repro script?
It looks like the pre-autograd ATen dialect graph. Here is my script:
import torch
from executorch.backends.qualcomm.partition.qnn_partitioner import QnnPartitioner
from executorch.backends.qualcomm.utils.utils import (
capture_program,
generate_htp_compiler_spec,
generate_qnn_executorch_compiler_spec,
)
from executorch.backends.qualcomm.serialization.qnn_compile_spec_schema import (
QcomChipset,
)
from executorch.exir.backend.backend_api import to_backend
class MutableStateModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("state", torch.tensor(torch.zeros(1,1)))
def forward(self, x):
self.state.add_(torch.ones(1,1))
return x @ self.state.T
model = MutableStateModule()
inputs = (torch.zeros(1,1),)
edge_prog = capture_program(model, inputs)
qnn_partitioner = QnnPartitioner(
generate_qnn_executorch_compiler_spec(
soc_model=QcomChipset.SM8650,
backend_options=generate_htp_compiler_spec(use_fp16=True),
debug=False,
saver=False,
shared_buffer=False,
),
)
edge_prog.exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)
From my side, I need to set environment variables like $EXECUTORCH_ROOT and PYTHONPATH before running the script.
Here is the reference:
https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html#setting-up-your-developer-environment
Are there any updates on this issue? Thanks!
Not yet - I think it's similar to this issue: https://github.com/pytorch/executorch/issues/4042
The graph looks like this:
def forward(self, b_state, x): aten_full_default = executorch_exir_dialects_edge__ops_aten_full_default([1, 1], 1, dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False) aten_add_tensor = executorch_exir_dialects_edge__ops_aten_add_Tensor(b_state, aten_full_default); b_state = aten_full_default = None aten_linear_default = executorch_exir_dialects_edge__ops_aten_linear_default(x, aten_add_tensor); x = None return (aten_add_tensor, aten_linear_default)
As shown in the graph, the linear op takes the output from the add op as its weight tensor.
In the code below, the weight_node is aten_add_tensor and it is None.
https://github.com/pytorch/executorch/blob/5584b9e3c865edca239ec5df6346f1d1aabb0276/backends/qualcomm/builders/op_linear.py#L51
And the weight_tensor_wrapper needs to get the value of the weight tensor, which causes the error.
I replaced the get_parameter function with the get_tensor function and it seems to work.
weight_tensor = self.get_tensor(weight_node, node)
Is this an okay bypass? Thank you.
Hmm does it work on runtime? I sort of doubt it...
@leigao97 hey just would like to follow up on this, are you still blocked on the issue?
Yes, the modification above was not correct. The reason why I encounter this issue is that I would like to run int8 weight only quantized model on QNN backend by following this procedure:
https://github.com/pytorch/executorch/blob/b448254a88edc6c30d8926abcebf5a1871d675cf/examples/models/llama2/source_transformation/quantize.py#L355
I found the root reason is that if we perform any operation on the buffer, then there will be an operator, and the output of that operator doesn't have a parameter value, so the linear_op will fail. In the quantized linear forward function above, the weight buffer is cast to float point, which also can cause this issue.
For now, I am using XNN backend instead.
Yes, the modification above was not correct. The reason why I encounter this issue is that I would like to run int8 weight only quantized model on QNN backend by following this procedure:
executorch/examples/models/llama2/source_transformation/quantize.py
Line 355 in b448254
class WeightOnlyInt8Linear(torch.nn.Module): I found the root reason is that if we perform any operation on the buffer, then there will be an operator, and the output of that operator doesn't have a parameter value, so the linear_op will fail. In the quantized linear forward function above, the weight buffer is cast to float point, which also can cause this issue.
For now, I am using XNN backend instead.
Is my understanding correct, that QNN backend assumes that linear op's weight input will always be a parameter and not a "varaible' coming from the graph itself. Is that the issue?