dernasherbrezon

Results 29 comments of dernasherbrezon

Without sudo: ``` VC4CL_DEBUG=system ./perf_fir_filter output length max: 199 output length max: 204 working_len_total: 10996 clGetDeviceIDs: 0 clCreateContext: 0 clCreateCommandQueue: 0 allocated working buf: 10996 gpuserv: vc_gpuserv_init: starting initialisation [VC4CL](perf_fir_filter):...

I turned on VCHI and now sudo is slow. So this is consistent with the "no sudo". It looks like VCHI is causing this slowness. After some number of executions,...

It gets weirder with more tests. 1. Execute the loop 2 times test * without sudo: average time: 0.004408, output: 0.000000000, 0.000000048 0.000284149, 0.000077057 * with sudo: average time: 0.102403,...

Ok. I will go with sudo-enabled access and start optimising the code. This is not related to this issue, but I switched to float8 and got 5x performance boost: *...

I have been extensively testing the timeout issue for the last several days: 1. It seems mailbox call 0x00030011 returns before the actual computation completes. Similar to the [issue with...

Couple more observations: 1. If I remove sleep(1), then after early return I cannot run application again. It will crash GPU (?) or saturate some internal buffer in ThreadX? Only...

Tried running code similar to [add.py](https://gist.githubusercontent.com/nomaddo/6a89bd57e1d30a14518a12f9ffdc9812/raw/68378571321baaad0cd13a91265f6d685f1850a3/add.py): ```python for x in range(100): start = time.time() drv.execute( n_threads=n_threads, program=code, uniforms=uniforms ) elapsed_gpu = time.time() - start print('GPU: {:.4f} sec'.format(elapsed_gpu)) ``` using [py-videocore](https://github.com/nineties/py-videocore)...

Tried bullseye and got timeout after very first execution when executing via MAILBOX. ``` [VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 00030002 00000008 00000004 00000005 00000000 00000000 [VC4CL](VC4CL Queue...

> Isn't this also related to raspberrypi/linux#4321? Unlikely. I'm executing exactly the same code all the time and it takes ~16264us to execute. On "buster" I've got timeout after several...