fc operator different performance in esp_nn test_app and esp-tflite-micro
Checklist
- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.
Issue or Suggestion Description
Hi, When deploying a TensorFlow Lite model using ESP-TFLite-Micro on ESP32-S3, I observed significantly lower inference speed. Upon comparing with a fully connected operator implementation in a test application, the test application runs 5x faster than the ESP-TFLite-Micro version. What could be causing this performance gap?
same idf env and same sdkconfig used in test_app and tflite-micro
esp-tflite-micro
test_app in esp_nn
Hi, @Lt20051495
To understand the scenario better, can you confirm that the configurations used in both the scenarios are same? Especially, the following options:
- If the SPIRAM is used (or not) in both the scenarios
- Cache configuration is in parity.
- flash mode is set to the common setting (Say Quad)
- Are the ESP-NN optimisations enabled?
Hi, @Lt20051495
To understand the scenario better, can you confirm that the configurations used in both the scenarios are same? Especially, the following options:
- If the SPIRAM is used (or not) in both the scenarios
- Cache configuration is in parity.
- flash mode is set to the common setting (Say Quad)
- Are the ESP-NN optimisations enabled? OK, Data locations: Based on the addresses of the data pointers, the input/output data and filter coefficients are all located in SPIRAM. Cache configuration: In idf.py menuconfig, after searching for the keyword "cache", both relevant projects are configured identically. Flash mode: The flash mode is set to Octal for all configurations. Optimizations: All code uses optimized assembly routines, and logs are consistently printed from the same location. Hypothesis: I suspect a cache-related issue. In my test_app, when I set up the data and perform computations, all data resides in the cache during that period. However, when running the fully connected operation in TFLite-Micro, the data might not be properly cached. How can I verify this?