executorch icon indicating copy to clipboard operation
executorch copied to clipboard

prefill model

Open cccclai opened this issue 1 year ago • 43 comments

Summary:

repro command

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Pass in 2.25 but fails in 2.26

Segfault error stacktrace

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: Qnn API version 2.19.0 is used. The version is tested against 2.18.0.
[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1523599==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x7f1585ee38e2 bp 0x7f16d5ab8800 sp 0x7ffed19ab8b0 T0)
==1523599==The signal is caused by a READ memory access.
==1523599==Hint: address points to the zero page.
SCARINESS: 10 (null-deref)
    #0 0x7f1585ee38e2  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65)
    #1 0x7f1585dd8926  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2bd8926) (BuildId: bc3ab8ddc89a0e65)
    #2 0x7f15844d1161  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d1161) (BuildId: bc3ab8ddc89a0e65)
    #3 0x7f15844dcac6  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12dcac6) (BuildId: bc3ab8ddc89a0e65)
    #4 0x7f15844d245b  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d245b) (BuildId: bc3ab8ddc89a0e65)
    #5 0x7f15b9bc7b21 in auto torch::executor::qnn::QnnInterface::qnn_backend_validate_op_config<void*, Qnn_OpConfig_t>(void*, Qnn_OpConfig_t) const fbcode/executorch/backends/qualcomm/runtime/backends/QnnFunctionInterface.h:39
    #6 0x7f15b9bc7682 in torch::executor::qnn::QnnBackend::BackendValidateOpConfig(Qnn_OpConfig_t const&) fbcode/executorch/backends/qualcomm/runtime/backends/QnnBackendCommon.h:41
    #7 0x7f15b9bc7115 in torch::executor::qnn::QnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/runtime/QnnManager.cpp:450
    #8 0x7f15b9dd44ee in torch::executor::qnn::PyQnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/aot/python/PyQnnManagerAdaptor.h:57
    #9 0x7f15b9e5b986 in pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)::operator()(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) const fbsource/pybind11/pybind11.h:84
    #10 0x7f15b9e5b8b5 in bool pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call_impl<bool, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&, 0ul, 1ul, pybind11::detail::void_type>(torch::executor::qnn::PyQnnManager&&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && fbsource/pybind11/cast.h:2042
    #11 0x7f15b9e53831 in std::enable_if<!std::is_void<bool>::value, bool>::type pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call<bool, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&>(pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&) && fbsource/pybind11/cast.h:2014
    #12 0x7f15b9e53454 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const fbsource/pybind11/pybind11.h:193
    #13 0x7f15b9e530d3 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) fbsource/pybind11/pybind11.h:170
    #14 0x7f15b9d8f707 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) fbsource/pybind11/pybind11.h:767
    #15 0x327141 in cfunction_call(_object*, _object*, _object*) (.__uniq.281047882695835599676768160755749362799) (/usr/local/fbcode/platform010/bin/python3.10+0x327141) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #16 0x349630 in _PyObject_MakeTpCall (/usr/local/fbcode/platform010/bin/python3.10+0x349630) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #17 0x5897d4 in method_vectorcall(_object*, _object* const*, unsigned long, _object*) (.__uniq.243338978568352371442406765225626566013.llvm.6236606370933165261) (/usr/local/fbcode/platform010/bin/python3.10+0x5897d4) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #18 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #19 0x331421 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331421) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #20 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #21 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #22 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #23 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #24 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #25 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #26 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #27 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #28 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #29 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #30 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #31 0x331577 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331577) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #32 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #33 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #34 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #35 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #36 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #37 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #38 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #39 0x39ad7d in _PyObject_FastCallDictTstate (/usr/local/fbcode/platform010/bin/python3.10+0x39ad7d) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #40 0x3c8b72 in slot_tp_call(_object*, _object*, _object*) (.__uniq.235726554139783955843240177532338160225) (/usr/local/fbcode/platform010/bin/python3.10+0x3c8b72) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #41 0x392ca8 in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x392ca8) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #42 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #43 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #44 0x331b18 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331b18) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #45 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #46 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #47 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #48 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #49 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #50 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #51 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #52 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #53 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #54 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #55 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #56 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #57 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #58 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #59 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #60 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #61 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #62 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #63 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #64 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #65 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #66 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #67 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #68 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #69 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #70 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #71 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #72 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #73 0x431565 in PyEval_EvalCode (/usr/local/fbcode/platform010/bin/python3.10+0x431565) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #74 0x431447 in run_mod(_mod*, _object*, _object*, _object*, PyCompilerFlags*, _arena*) (.__uniq.251861886623903963524397139660542440724.llvm.17622910512627074885) (/usr/local/fbcode/platform010/bin/python3.10+0x431447) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #75 0x4e3054 in pyrun_file(_IO_FILE*, _object*, int, _object*, _object*, int, PyCompilerFlags*) (.__uniq.251861886623903963524397139660542440724) (/usr/local/fbcode/platform010/bin/python3.10+0x4e3054) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #76 0x4e2b54 in _PyRun_SimpleFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e2b54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #77 0x4e28f1 in _PyRun_AnyFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e28f1) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #78 0x4d4a54 in Py_RunMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d4a54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #79 0x4d286b in pymain_main(_PyArgv*) (.__uniq.297908980262787110426434251325078884054) (/usr/local/fbcode/platform010/bin/python3.10+0x4d286b) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #80 0x4d2759 in Py_BytesMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d2759) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #81 0x7f19e282c656 in __libc_start_call_main (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c656) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6)
    #82 0x7f19e282c717 in __libc_start_main@GLIBC_2.2.5 (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c717) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6)
    #83 0x553d90 in _start (/usr/local/fbcode/platform010/bin/python3.10+0x553d90) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
AddressSanitizer can not provide additional info.
AddressSanitizer: SEGV (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65)
==1523599==ABORTING

Differential Revision: D63736779

cccclai avatar Oct 02 '24 01:10 cccclai

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5807

Note: Links to docs will display an error until the docs builds have been completed.

:x: 13 New Failures

As of commit 35387f6de7c731e8d3f52ce504c2abd912c6f096 with merge base 13408b9848b1a776f03ff0fbac5f18b6347ff64a (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Oct 02 '24 01:10 pytorch-bot[bot]

This pull request was exported from Phabricator. Differential Revision: D63736779

facebook-github-bot avatar Oct 02 '24 01:10 facebook-github-bot

@shewu-quic @haowhsu-quic @chunit-quic

chiwwang avatar Oct 02 '24 04:10 chiwwang

We will try to reproduce this at our side.

chiwwang avatar Oct 02 '24 04:10 chiwwang

Hi @cccclai , Thanks for this PR. I could also reproduce the error. About the segmentation fault for per channel 16a4w linear in QNN 2.26, we also find in our unit test. We will investigate more for this issue. If possible, could you use convert_linear_to_conv pass to bypass this issue?

shewu-quic avatar Oct 02 '24 06:10 shewu-quic

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

chiwwang avatar Oct 02 '24 06:10 chiwwang

Sad, the segmentation fault of linear was detected around 2.26~2.27 timeframe. The fix is not released yet. ETA is QNN 2.28, which is at the end of Oct.

chiwwang avatar Oct 02 '24 06:10 chiwwang

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

Hi @cccclai, @chiwwang,

It seems that I could not reproduce the op validation failed for matmul op on my end when using QNN 2.26 and add the convert_linear_to_conv pass. The below is my call sequence.

./install_requirements.sh
cp schema/*.fbs exir/_serialize/
export PYTHONPATH=/local/mnt/workspace/test_cc/
export ANDROID_NDK=/local/mnt/workspace/shewu/android-ndk-r26c
export QNN_SDK_ROOT=/local/mnt/workspace/shewu/qairt/2.26.0.240828
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang
./backends/qualcomm/scripts/build.sh
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

shewu-quic avatar Oct 02 '24 07:10 shewu-quic

This pull request was exported from Phabricator. Differential Revision: D63736779

facebook-github-bot avatar Oct 02 '24 19:10 facebook-github-bot

I update the PR to use linear to conv pass now as the segfault can reproduced now. Here is the latest log prefill_qnn.log

I can see matmul fails to lower

...
[QNN Partitioner Op Support]: aten.convolution.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.matmul.default | False
[QNN Partitioner Op Support]: aten._softmax.default | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.div.Tensor | True
[QNN Partitioner Op Support]: aten.matmul.default | False
...

cccclai avatar Oct 02 '24 20:10 cccclai

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on the device yet 😨

chiwwang avatar Oct 03 '24 00:10 chiwwang

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on the device yet 😨

Yeah....it is still AOT and not on device yet

cccclai avatar Oct 03 '24 00:10 cccclai

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

cccclai avatar Oct 03 '24 02:10 cccclai

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

I'm also stuck in buck build-flow. Let me submit a comment below. I aim to add the soc information to QC backend. I remembered we moved these things to Python.

chiwwang avatar Oct 03 '24 02:10 chiwwang

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

Hi @cccclai, @chiwwang,

It seems that I could not reproduce the op validation failed for matmul op on my end when using QNN 2.26 and add the convert_linear_to_conv pass. The below is my call sequence.

./install_requirements.sh
cp schema/*.fbs exir/_serialize/
export PYTHONPATH=/local/mnt/workspace/test_cc/
export ANDROID_NDK=/local/mnt/workspace/shewu/android-ndk-r26c
export QNN_SDK_ROOT=/local/mnt/workspace/shewu/qairt/2.26.0.240828
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang
./backends/qualcomm/scripts/build.sh
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

should be ANDROID_NDK_ROOT instead of ANDROID_NDK.

chiwwang avatar Oct 03 '24 02:10 chiwwang

Hi @cccclai I added the SOC here: https://github.com/cccclai/executorch-1/pull/1 I ran a silly model with soc_model=SSG2115P on a SM8550 and it seems OK. I will test the command shared here.

[update]

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w --soc_model SSG2115P

seems to work 😮

chiwwang avatar Oct 03 '24 03:10 chiwwang

Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. image If you have any problem, please let me know.

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

shewu-quic avatar Oct 03 '24 06:10 shewu-quic

Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. image If you have any problem, please let me know.

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Hey @shewu-quic Is there PR link? And I'm thinking if you can add more descriptions to let us understand how you achieve this, that would be great 😄

chiwwang avatar Oct 03 '24 06:10 chiwwang

Oh~ sure, let me add more descriptions for this PR About 16x8 matmul op, I think it can be divided into two types according to whether to use kv cache. - if use kv cache, we could annotate 8 bits to past KV (input) along the second input of matmul to lower the size of input tensor. - if not use kv cache, I think we could just annotate 16x8 matmul to improve performance.

By default, we annotate matmul with 16x16 in 16 bits quantization and we could override it with add_custom_quant_annotations.

shewu-quic avatar Oct 03 '24 07:10 shewu-quic

So it's "custom annotation", almost based on the topology of the graph, right? We look into the graph and choose a node to annotate, which helps us to obtain 16x8 matmul. Do I understand correctly?

chiwwang avatar Oct 03 '24 07:10 chiwwang

So it's "custom annotation", almost based on the topology of the graph, right? We look into the graph and choose a node to annotate, which helps us to obtain 16x8 matmul. Do I understand correctly?

Yes, that right. After applying the custom annotation, you could get the below.

                                                                  q (16 bits) -> dq (16 bits)--\
                                                                                                 matmul -> q (16 bits) -> dq (16 bits)
q (16 bits) -> dq (16 bits) -> op -> q (16 bits) -> dq (16 bits) -> q (8 bits) -> dq (8 bits)--/ 

For q (16 bits) -> dq (16 bits) -> q (8 bits) -> dq (8 bits) pattern, we will tag requantize to op and insert to_copy (QNN Convert or Cast) after op.

shewu-quic avatar Oct 03 '24 07:10 shewu-quic

Got it Thanks. Note that the command should contain --soc_model SSG2115P for correct VTCM size. (need PR https://github.com/cccclai/executorch-1/pull/1, though) python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w --soc_model SSG2115P

chiwwang avatar Oct 03 '24 08:10 chiwwang

Thanks folks! I was able to get the model running with embedding/matmul lower with these changes. Maybe we can extend the soc table? The change looks reasonable to me.

cccclai avatar Oct 03 '24 18:10 cccclai

layer norm op lowering:

We have a different model using layernorm instead rmsnorm, because the runtime just recently bumps to 2.25 and the current model still uses layernorm, I'll make change on this PR with the PRs your folks sent to test both layernorm and rmsnorm.

[edit]: made some progress on it. The bias node quant node looks suspicious...it's

    %dequantize_per_tensor_2 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.dequantize_per_tensor.default](args = (%b__frozen_param2, 9.5367431640625e-07, 0, -2147483648, 2147483647, torch.int32), kwargs = {})
...
    %layer_norm : [num_users=1] = call_function[target=torch.ops.aten.layer_norm.default](args = (%dequantize_per_tensor_11, [64], %dequantize_per_tensor_1, %dequantize_per_tensor_2), kwargs = {})

...

cccclai avatar Oct 03 '24 19:10 cccclai

In the meanwhile, we're tracking latency (both model loading time and inference time), memory, power and accuracy for production. Latency and accuracy are easier, how about memory and power?

cccclai avatar Oct 03 '24 19:10 cccclai

This pull request was exported from Phabricator. Differential Revision: D63736779

facebook-github-bot avatar Oct 03 '24 21:10 facebook-github-bot

Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. image If you have any problem, please let me know.

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Thanks! I was able to lower embedding, however the latency seems very close to the cpu version. Maybe we will have a better memory usage by using the qnn embedding? I feel like the alternative solution includes: 0. use cpu fp embedding

  1. use 16x8 qnn embedding
  2. use cpu 4bit embedding https://github.com/pytorch/executorch/blob/13408b9848b1a776f03ff0fbac5f18b6347ff64a/kernels/quantized/cpu/op_embedding4b.cpp

and then maybe we have a better understanding on the latency/memory for these options.

cccclai avatar Oct 03 '24 21:10 cccclai

Oh~ sure, let me add more descriptions for this PR About 16x8 matmul op, I think it can be divided into two types according to whether to use kv cache. - if use kv cache, we could annotate 8 bits to past KV (input) along the second input of matmul to lower the size of input tensor. - if not use kv cache, I think we could just annotate 16x8 matmul to improve performance.

By default, we annotate matmul with 16x16 in 16 bits quantization and we could override it with add_custom_quant_annotations.

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

cccclai avatar Oct 03 '24 21:10 cccclai

[edit]: made some progress on it. The bias node quant node looks suspicious...it's

    %dequantize_per_tensor_2 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.dequantize_per_tensor.default](args = (%b__frozen_param2, 9.5367431640625e-07, 0, -2147483648, 2147483647, torch.int32), kwargs = {})
...
    %layer_norm : [num_users=1] = call_function[target=torch.ops.aten.layer_norm.default](args = (%dequantize_per_tensor_11, [64], %dequantize_per_tensor_1, %dequantize_per_tensor_2), kwargs = {})

...

We quantize bias node to int32 by default. https://github.com/cccclai/executorch-1/blob/35387f6de7c731e8d3f52ce504c2abd912c6f096/backends/qualcomm/quantizer/utils.py#L1074

shewu-quic avatar Oct 04 '24 02:10 shewu-quic

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

Do you mean quantize the model in 8x8? Maybe we could give it a try, but I doubt we'd get reasonable accuracy on it.

shewu-quic avatar Oct 04 '24 02:10 shewu-quic