ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

.all vs .cpuAndNeuralEngine?

Open pj4533 opened this issue 2 years ago • 4 comments

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

pj4533 avatar Feb 05 '23 21:02 pj4533

Hey, In my tests on M1max, AllComputeUnits balanced the charge between GPU and ANE, not scaling it, it's slightly slower than GPU only. gens with GPUonly used 90% GPU gens with AllComp used 60% GPU with 0.25step/sec(?) less .

Just sharing my opinion, nothing official :D

l2gakuen avatar Feb 07 '23 06:02 l2gakuen

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

msiracusa avatar Feb 07 '23 19:02 msiracusa

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

great! thanks for the reply.

pj4533 avatar Feb 07 '23 20:02 pj4533

I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?

Your assumption is correct. Specifying .all lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine option is yielding better results than .all is a known issue on this set of models and certain systems.

Just wanted to reopen this, just as a question...lemme know if there is a better forum.

I noticed when using .cpuAndNeuralEngine that occasionally I get very bad performance. For example, normally I get about 1.9 step/s on my MacMini M1, however, sometimes that drops to about 0.08 step/s.

Some notes:

  • Typically, it will go back up once it finishes a given image (on my tests, usually 26 steps at ~0.5 strength, using image2image), but not always
  • Usually it happens when I have left my system running, and I come back and unlock it using TouchID? (maybe a connection to the NeuralEngine there?)

Anyway, I'm happy to open a separate issue if you think its worth it, but figured I'd ask the question first since its possible I am just not understanding something!

pj4533 avatar Feb 09 '23 12:02 pj4533