esp-nn icon indicating copy to clipboard operation
esp-nn copied to clipboard

fc operator different performance in esp_nn test_app and esp-tflite-micro

Open Lt20051495 opened this issue 9 months ago • 2 comments

Checklist

  • [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
  • [x] Provided a clear description of your suggestion.
  • [x] Included any relevant context or examples.

Issue or Suggestion Description

Hi, When deploying a TensorFlow Lite model using ESP-TFLite-Micro on ESP32-S3, I observed significantly lower inference speed. Upon comparing with a fully connected operator implementation in a test application, the test application runs 5x faster than the ESP-TFLite-Micro version. What could be causing this performance gap?

same idf env and same sdkconfig used in test_app and tflite-micro

esp-tflite-micro Image

test_app in esp_nn Image

Lt20051495 avatar Mar 28 '25 01:03 Lt20051495

Hi, @Lt20051495

To understand the scenario better, can you confirm that the configurations used in both the scenarios are same? Especially, the following options:

  • If the SPIRAM is used (or not) in both the scenarios
  • Cache configuration is in parity.
  • flash mode is set to the common setting (Say Quad)
  • Are the ESP-NN optimisations enabled?

vikramdattu avatar Mar 28 '25 06:03 vikramdattu

Hi, @Lt20051495

To understand the scenario better, can you confirm that the configurations used in both the scenarios are same? Especially, the following options:

  • If the SPIRAM is used (or not) in both the scenarios
  • Cache configuration is in parity.
  • flash mode is set to the common setting (Say Quad)
  • Are the ESP-NN optimisations enabled? OK, ‌Data locations‌: Based on the addresses of the data pointers, the input/output data and filter coefficients are all located in SPIRAM. ‌Cache configuration‌: In idf.py menuconfig, after searching for the keyword "cache", both relevant projects are configured identically. ‌Flash mode‌: The flash mode is set to ‌Octal‌ for all configurations. ‌Optimizations‌: All code uses optimized assembly routines, and logs are consistently printed from the same location. ‌Hypothesis‌: I suspect a ‌cache-related issue‌. In my test_app, when I set up the data and perform computations, all data resides in the cache during that period. However, when running the ‌fully connected operation‌ in TFLite-Micro, the data might not be properly cached. How can I verify this?

Lt20051495 avatar Mar 28 '25 06:03 Lt20051495