py-videocore6 icon indicating copy to clipboard operation
py-videocore6 copied to clipboard

Un-match with the `examples` result

Open zhengpeirong opened this issue 8 months ago • 0 comments

Dear author,

I am using Raspberrry Pi 4B, and here is my result:

  1. sgemm.py
pi@node01:~/py-videocore6 $ PYTHONPATH=sandbox/ python3 examples/sgemm.py
==== sgemm example (1024x1024 times 1024x1024) ====
numpy: 0.08994 sec, 23.91 Gflop/s
QPU:   0.5661 sec, 3.799 Gflop/s
Minimum absolute error: 0.0
Maximum absolute error: 0.0003814697265625
Minimum relative error: 0.0
Maximum relative error: 0.13134673237800598
  1. pctr_gpu_clock.py
pi@node01:~/py-videocore6 $ sudo PYTHONPATH=sandbox/ python3 examples/pctr_gpu_clock.py 
==== QPU clock measurement with performance counters ====
500.08264399999996 MHz
  1. memset.py
pi@node01:~/py-videocore6 $ PYTHONPATH=sandbox/ python3 examples/memset.py 
==== memset example (64.0 MiB) ====
Preparing for buffers...
Traceback (most recent call last):
  File "/home/pi/py-videocore6/examples/memset.py", line 148, in <module>
    main()
  File "/home/pi/py-videocore6/examples/memset.py", line 143, in main
    memset(fill=0x5a5a5a5a, length=16 * 1024 * 1024)
  File "/home/pi/py-videocore6/examples/memset.py", line 119, in memset
    X.fill(~fill)
OverflowError: Python integer -1515870811 out of bounds for uint32
  1. scopy.py
pi@node01:~/py-videocore6 $ PYTHONPATH=sandbox/ python3 examples/scopy.py 
==== scopy example (16.0 Mi elements) ====
Preparing for buffers...
Traceback (most recent call last):
  File "/home/pi/py-videocore6/examples/scopy.py", line 201, in <module>
    main()
  File "/home/pi/py-videocore6/examples/scopy.py", line 196, in main
    scopy(length=16 * 1024 * 1024)
  File "/home/pi/py-videocore6/examples/scopy.py", line 181, in scopy
    unif[-1] = 4 * (-len(unif) + 3)
    ~~~~^^^^
OverflowError: Python integer -8 out of bounds for uint32
  1. summation.py
pi@node01:~/py-videocore6 $ PYTHONPATH=sandbox/ python3 examples/summation.py 
==== summaton example (32.0 Mi elements) ====
Preparing for buffers...
Traceback (most recent call last):
  File "/home/pi/py-videocore6/examples/summation.py", line 189, in <module>
    main()
  File "/home/pi/py-videocore6/examples/summation.py", line 184, in main
    summation(length=32 * 1024 * 1024)
  File "/home/pi/py-videocore6/examples/summation.py", line 169, in summation
    unif[-1] = 4 * (-len(unif) + 3)
    ~~~~^^^^
OverflowError: Python integer -12 out of bounds for uint32

In summary, only the QPU clock shows 500MHz, the same as the tutorial. But the segmm time of QPU is 0.5661 sec >> numpy: 0.08994 sec. Moreover, other examples have non-trivial bugs. Could you please give some thoughts about the solution?

zhengpeirong avatar Jun 18 '24 09:06 zhengpeirong