pyclesperanto_prototype icon indicating copy to clipboard operation
pyclesperanto_prototype copied to clipboard

Benchmark on M1.

Open Carreau opened this issue 4 years ago • 13 comments

Following on https://twitter.com/haesleinhuepf/status/1453371569955299331

I have no clue what I'm doing with GPUs, and maybe here is better than twitter ?

Carreau avatar Oct 27 '21 15:10 Carreau

Hi Matthias @Carreau ,

thanks for reaching out here! Peter @psobolewskiPhD was trying clesperanto earlier on an M1 (details) and lists this config to make it run:

everything is arm64 native python 3.9 via conda pyopencl needs to be 2021.2.6 to work on arm64 pyclesperanto-prototype works fine, programatically.

Does your config deviate a lot from that?

Thanks for trying!

Robert

haesleinhuepf avatar Oct 27 '21 15:10 haesleinhuepf

I got the notebook figured out—right click on Raw, download. @Carreau you also need the data file from here: https://github.com/clEsperanto/pyclesperanto_prototype/blob/master/data/Haase_MRT_tfl3d1.tif And then you need to change the path in the 2nd code cell to match where you have the image. image

cupy won't work—no CUDA—so you need to add at the top of the clEsperanto cell: import time image Then everything should run.

Here's the output for the clEsperanto on my M1

clEsperanto affine transform duration: 0.016510009765625
clEsperanto affine transform duration: 0.011471033096313477
clEsperanto affine transform duration: 0.007179737091064453
clEsperanto affine transform duration: 0.008187055587768555
clEsperanto affine transform duration: 0.00746607780456543
clEsperanto affine transform duration: 0.009069204330444336
clEsperanto affine transform duration: 0.0077130794525146484
clEsperanto affine transform duration: 0.00803518295288086
clEsperanto affine transform duration: 0.0078008174896240234
clEsperanto affine transform duration: 0.008910179138183594
/Users/piotrsobolewski/Dev/miniforge3/envs/napari-pyCL-D/lib/python3.9/site-packages/skimage/io/_plugins/matplotlib_plugin.py:150: UserWarning: Float image out of standard range; displaying image with stretched contrast.
  lo, hi, cmap = _get_display_range(image)

Here's scipy

scipy affine transform duration: 6.5906081199646
scipy affine transform duration: 6.637574911117554
scipy affine transform duration: 6.514955043792725
scipy affine transform duration: 6.518755197525024
scipy affine transform duration: 6.5272181034088135
scipy affine transform duration: 6.530372142791748
scipy affine transform duration: 6.527111053466797
scipy affine transform duration: 6.628083944320679
scipy affine transform duration: 6.66226601600647
scipy affine transform duration: 6.558634996414185

Edit: FYI: the scipy cell runs as 100% CPU in 1 python3.9 process. Edit2: for the record this is with python 3.9.6 from conda-forge, pyopencl Version: 2021.2.9 from pip, and pyclesperano Version: 0.10.4 from pip.

psobolewskiPhD avatar Oct 27 '21 16:10 psobolewskiPhD

@haesleinhuepf I wonder, can we bench the memory transfer? I mean we have a unified memory architecture, shouldn't the penalty for moving to&from GPU be lower?

psobolewskiPhD avatar Oct 27 '21 16:10 psobolewskiPhD

I mean we have a unified memory architecture, shouldn't the penalty for moving to&from GPU be lower?

Not sure. OpenCL was not developed for this architecture. It's possible that it can't exploit this.

haesleinhuepf avatar Oct 27 '21 16:10 haesleinhuepf

Right, that's why I hoped there was a test! (FYI I get this warning when setting up the context:

/Users/piotrsobolewski/Dev/miniforge3/envs/napari-pyCL-D/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_pycl.py:45: 
UserWarning: Data type double is not supported by your GPU. Will use float instead.

psobolewskiPhD avatar Oct 27 '21 16:10 psobolewskiPhD

if you get the benchmark than we are good. Personally I get "FileNotFound _tier1/../clij-opencl-kernels/kernels/copy_3d_x.cl'" when running affine_transform

Carreau avatar Oct 27 '21 16:10 Carreau

Odd...I have no notes of issues with installing pyopencl or pyclesperanto. Just that pyopencl needs to be 2021.2.6 or newer. Are you using a native arm64 conda env?

Re: the bench, mines a regular M1, yours Pro, so yours should be ~2X faster according to specs anyways.

psobolewskiPhD avatar Oct 27 '21 16:10 psobolewskiPhD

yours should be ~2X faster according to specs anyways.

That's what I was curious about. But my Mac should be on the way too. Thanks @Carreau for trying and thanks @psobolewskiPhD for the support!

haesleinhuepf avatar Oct 27 '21 17:10 haesleinhuepf

I'm trying from scratch: Step 1: mamba create -y --name pycl-test python=3.9 Step 2: pip install pyopencl Step 3: pip install pyclesperanto-prototype Derp: scikit-image has a numpy issue! Let's use conda for that mamba install scikit-image Now back to pip: pip install pyclesperanto-prototype Success! We now have:

Name: pyclesperanto-prototype
Version: 0.10.4
Name: pyopencl
Version: 2021.2.9

And everything peachy. So now we just need need jupyter. Here I use VS Code, which installs ipykernel etc. for me, all from conda-forge. And bam, it all works:

clEsperanto affine transform duration: 0.017657041549682617
clEsperanto affine transform duration: 0.010656118392944336
clEsperanto affine transform duration: 0.014425992965698242
clEsperanto affine transform duration: 0.00988316535949707
clEsperanto affine transform duration: 0.009964704513549805
clEsperanto affine transform duration: 0.011729955673217773
clEsperanto affine transform duration: 0.008846044540405273
clEsperanto affine transform duration: 0.009747028350830078
clEsperanto affine transform duration: 0.007620096206665039
clEsperanto affine transform duration: 0.008332014083862305

psobolewskiPhD avatar Oct 27 '21 18:10 psobolewskiPhD

Yeah I did the same except i did close and pip install -e . , and installed pyopencl after.

Carreau avatar Oct 27 '21 18:10 Carreau

@haesleinhuepf In case you're curious, I happened to see your tophat benches. Here's that notebook on my 2020 M1:

skimage top-hat disk duration: 3.705864906311035
skimage top-hat disk duration: 3.7178118228912354
skimage top-hat disk duration: 3.6921098232269287
skimage top-hat disk duration: 3.6890509128570557
skimage top-hat disk duration: 3.7015788555145264
skimage top-hat square duration: 0.10191106796264648
skimage top-hat square duration: 0.08366107940673828
skimage top-hat square duration: 0.0834510326385498
skimage top-hat square duration: 0.08431601524353027
skimage top-hat square duration: 0.08443880081176758
pyclesperanto top-hat-shere duration: 0.13574910163879395
pyclesperanto top-hat-shere duration: 0.12321710586547852
pyclesperanto top-hat-shere duration: 0.13194704055786133
pyclesperanto top-hat-shere duration: 0.13232779502868652
pyclesperanto top-hat-shere duration: 0.130479097366333
pyclesperanto top-hat-box duration: 0.02432084083557129
pyclesperanto top-hat-box duration: 0.016895055770874023
pyclesperanto top-hat-box duration: 0.016459941864013672
pyclesperanto top-hat-box duration: 0.016234874725341797
pyclesperanto top-hat-box duration: 0.016022920608520508

Edit: note that in principle @Carreau M1Pro should be 2x as capable and the M1Max 4x.

psobolewskiPhD avatar Oct 28 '21 19:10 psobolewskiPhD

Wow, thanks @psobolewskiPhD that's promising. Just a note: the top-hat filter calls internally three operations, min filter, max filter and subtraction. Each of these calls (the call itself) takes about 3ms. Furthermore, temporary memory is allocated, which also takes time. Thus, it's unlikely that top-hat-box (the last one in your list) can be even faster than measured on your M1. If you compare it to my RTX 2080 TI benchmark: It is as fast as your M1. Congrats to your computer! ;-)

haesleinhuepf avatar Oct 28 '21 20:10 haesleinhuepf

It's so kind of you to temper my covetous feelings towards the new M1 Macs.. 🤣

psobolewskiPhD avatar Oct 28 '21 20:10 psobolewskiPhD