Michael Misiewicz comments

Results 27 comments of


                                            Michael Misiewicz

MPS device appears much slower than CPU on M1 Mac Pro

I'd also add that I saw the same effects when running `timeit` with the toy examples in `docs/source/notes/mps.rst`. (e.g.: `timeit.timeit(lambda: x * 2, number=100000)` on both mps/GPU.

MPS device appears much slower than CPU on M1 Mac Pro

Example of the toy example run: ```python In [17]: # toy example mps ...: import timeit ...: import torch ...: import random ...: ...: x = torch.ones(5000, device="mps") ...: timeit.timeit(lambda:...

MPS device appears much slower than CPU on M1 Mac Pro

> The neural engine can't be used for training anyway. It only supports Float16, Int8, and UInt8, and is only accessible through CoreML and MLCompute. PyTorch uses neither of these...

MPS device appears much slower than CPU on M1 Mac Pro

Fascinating. That hypothesis might also explain why the delta is so much worse with the toy example compared to the full size BERT.

MPS device appears much slower than CPU on M1 Mac Pro

@kulinseth @albanD I noticed this ticket's been subject to triage, and a few other folks have filed issues regarding similar observations. Do you know how the [figure in the press...

MPS device appears much slower than CPU on M1 Mac Pro

@philipturner Thanks for the great read. @kulinseth thanks for sharing, I'll try running that on my system, and I'm curious to dig in, because when I ran my own benchmarks...

Export to CSV File

Agreed for XLSX export! And CSVs too!

bug: `numericlocale` does not seem to be respected

Interesting, thanks for the context. Updating the output might be helpful for confusion reduction.

bug: `numericlocale` does not seem to be respected

Amazing news! Confirmed it looks good with the latest commit!

uint8 as internal data

Seconded!