gap_sdk
gap_sdk copied to clipboard
Reduce high execution time when using the SDK
I would need to execute the MNIST example on GAP8 and retrieve the output in the host computer inside a loop, so my application would look like this:
while 1:
mark time before the execution
execute the command "gapy --target=gapuino_v2 --platform=board --work-dir=<path_sdk>/examples/autotiler/Mnist/BUILD/GAP8_V2/GCC_RISCV_PULPOS run --exec-prepare --exec --binary=<path_sdk>/examples/autotiler/Mnist/BUILD/GAP8_V2/GCC_RISCV_PULPOS/Mnist"
calc the execution time
post process the output from the MNIST
However, the average execution time that I'm getting for the simple MNIST example is 4.9s. So, is this execution time correct? Is there any way to reduce this execution time knowing that I have to post-process the application's output on a host computer for each iteration?
My host computer is an Ubuntu 20.04 connected to GAPUINO by USB cable.
Thanks in advance.
I also meet this problem on the sdk 4.8.0+, in the sdk3.8.1 is no problem
Hi @fernandoFernandeSantos and @aqqz
Fyi, execution time not = real time on chip. The simulator doesn't work in real-time, it simulate how many CYCLES an application running on the target.
For example, an application run in GVSOC, with perf counter we get maybe 5 million cycles. In this case, on GAP8, since we can change the frequency of GAP8 from 0 - 175MHz. So for example, when the chip running at 175MHz, it takes : 5/175 = about 28ms.
However, if you measure the GVSOC execution time, it may takes several seconds, because the GVSOC itself on your host will need time to "simulate" all the signals. Therefore, if you run it on a high performance PC, it will be faster.
You feel the execution time difference here, doesn't mean the difference on the target has been changed. But only because we have added more features in GVSOC which causes the GVSOC is heavier.
Thanks for the reply @Yaooooo But in my case, I'm not running on GVSOC. I need to run multiple times on the GAPUINO board and communicate using USB JTAG, so in this case, is it still expected to have high execution times?
Thanks for your advise, on the chip, the first time run normal, but next and laster it stop on call cluster
@Yaooooo
@fernandoFernandeSantos unfortunately yes, it's due to the printf via JTAG -> usb, which is very slow. One way to optimize it is using io=uart to have the printf via uart.
@aqqz can you please describe a bit more about how you run it? What you mean first time and next? you have put a loop inside? or just rerun it with "make run" ?
@fernandoFernandeSantos unfortunately yes, it's due to the printf via JTAG -> usb, which is very slow. One way to optimize it is using io=uart to have the printf via uart.
Thanks, @Yaooooo. Is there a way to pass the data through USB/UART without using printf? I don`t really need the printf; I just need to post-process the data from the gapuino in a fast manner. It can be an array of bytes.
@Yaooooo Hello, how large model can AIdeck run on it? I trained a 3MB model which could run on the gvsoc but on AIdeck failed, it stuck at run cluster
. Is the model is to large? I think the hyperflash is larger than 3MB.