Max Ren
Max Ren
@gonnet I didn't know that separating these build targets was an active effort. Honestly the level of separation of all MICROKERNEL_CONFIG srcs living in `src/configs`, all OPERATOR_SRCS living in `src/operators`,...
i see, there have been some issues with convert_to_linear, and in it's place i've been trying to use to_edge_transform_and_lower instead because that's a bit easier to exercise. What is yoru...
@digantdesai yea we should probably check that model size isn't bloated. We don't have any CI here that checks for this so right now we would have to do so...
https://pytorch.org/executorch/0.6/runtime-profiling.html you can try doing some operator profiling through here. On the other hand, if you can share a flame graph of the model run, that might also be helpful...
great! it looks like the issue is that there are aten.mm.default nodes which aren't getting delegated to xnnpack? Are you using torch.mm between two dynamic inputs? As in one of...
it looks like concatenate could potentially be lowered? I think the only reason it would fail to be lowered if it wasn't a float value. Based on the timings, it...
Another suggestion is building with the optimized operator library on: https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L213 there are some ops that fall through XNNPACK, and are run on executorch (native_layer_norm) which can be accelerated.
hi @tdasika17 , the profiling you shared is suggesting that running the model is only taking 105ms (see the execute row at the bottom). Are your profiling elsewhere to find...
hi i saw this line in the profiling: ``` python ssm_single_forward_gen.py -weights ../../models/ Python model setup time: 3.759671 seconds ``` is this where the 3.7 seconds you're getting is coming...
i see, is what you're suggesting the measured 3.5 seconds is the entire ::generate call? Which would likely be running inference multiple times for prefill and decode? That would clear...