ml-stable-diffusion
ml-stable-diffusion copied to clipboard
.all vs .cpuAndNeuralEngine?
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Hey, In my tests on M1max, AllComputeUnits balanced the charge between GPU and ANE, not scaling it, it's slightly slower than GPU only. gens with GPUonly used 90% GPU gens with AllComp used 60% GPU with 0.25step/sec(?) less .
Just sharing my opinion, nothing official :D
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying .all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted .cpuAndNeuralEngine
option is yielding better results than .all
is a known issue on this set of models and certain systems.
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying
.all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted.cpuAndNeuralEngine
option is yielding better results than.all
is a known issue on this set of models and certain systems.
great! thanks for the reply.
I was assuming that "all" for compute units would choose the best one for your setup, but I tried cpuAndNeuralEngine and got better performance. What's happening here? What does "all" do vs specifying?
Your assumption is correct. Specifying
.all
lets Core ML use any combination of compute units {CPU, GPU, Neural Engine} to optimize prediction latency. The fact that the more restricted.cpuAndNeuralEngine
option is yielding better results than.all
is a known issue on this set of models and certain systems.
Just wanted to reopen this, just as a question...lemme know if there is a better forum.
I noticed when using .cpuAndNeuralEngine
that occasionally I get very bad performance. For example, normally I get about 1.9 step/s on my MacMini M1, however, sometimes that drops to about 0.08 step/s.
Some notes:
- Typically, it will go back up once it finishes a given image (on my tests, usually 26 steps at ~0.5 strength, using image2image), but not always
- Usually it happens when I have left my system running, and I come back and unlock it using TouchID? (maybe a connection to the NeuralEngine there?)
Anyway, I'm happy to open a separate issue if you think its worth it, but figured I'd ask the question first since its possible I am just not understanding something!