tdasika17
tdasika17
Hi @GregoryComer , The timings I reported was with to_edge_transform_and_lower(). While experimenting I tried to_backend() and the inference time is around ~27 seconds here. So, My original observation of ~16...
[pte_model_profiling.txt](https://github.com/user-attachments/files/19846795/pte_model_profiling.txt) Attaching the model profiling information.
Ah, Thanks..!! I fixed that, modified graph to move '_aten_mm_default_' ops to use '_aten_mul_tensor_' instead, Now the time came to 3.5 seconds. Are there any other ops from the list...
Hi @mcr229, I have used this option in my build already _**option(EXECUTORCH_BUILD_KERNELS_OPTIMIZED "" ON)**_ and linked the library '_**optimized_native_cpu_ops_lib**_' to my app. [pte_model_profiling_3seconds.txt](https://github.com/user-attachments/files/19886551/pte_model_profiling_3seconds.txt) Attaching the profiling of model with 3.5seconds....
The model trace (.etdp) is collected at actual model inference, below is the code snippet. ``` Module model(model_path, Module::LoadMode::MmapUseMlockIgnoreErrors, std::move(etdump_gen_)); vector output_ids = generate(model, input_tokens); ETDumpGen* etdump_gen = static_cast(model.event_tracer()); ET_LOG(Info,...
Hi, Sorry for the confusion. That is another python script to generate pt.model and export it. The 3.5 second, that I'm talking is purely c++ inference time, Which is timed...
Hi @mcr229 , yes, the time is computed for overall inference, which contains injesion phase + output token generation, post processing of o/p tokens. I have converted a .pt model...
@mcr229 , Here is the profiling of model during real inference. Generated etdump.bin while exporting, and used dev tools to generate inspector.txt. Attaching the excel related to the above table....
Hi @mcr229 , The profiling is done on the same machine. When I meant the inference time ~3.5 Sec, I timed it out before and after the generate method. My...
Hi @mcr229 , Today, I tried executorch model on a different x86 server. I got different inference time here for the same application (~7.8 sec), this may be because of...