Add `MPSMatrixRandom`
Adds https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixrandom?language=objc and the other classes that start with "MPSMatrixRandom".
Code to integrate with Random.jl functions is present but commented out because it seems to be slower than the current rand! functionality from GPUArrays.
Code to integrate with
Random.jlfunctions is present but commented out because it seems to be slower than the currentrand!functionality from GPUArrays.
The quality of random numbers produced by GPUArrays' RNG is probably lower though, so I would consider just using the one from MPS.
A few comments.
- I did some stuff to get
randto work on all supported Integer types. Is this valid or does it affect the quality of random numbers? - Every buffer is now created to a multiple of 16 bytes as the random number generation generates 32-bit values in multiples of 4. Is this acceptable? Is there an better alternative?
- I'm not sure how to approach seeding. Any ideas?
- Float16 numbers are the only values that don't use MPS. Should there be a warning somewhere mentioning this?
- MPSMatrixRandom requires buffer offset to also be a multiple of 4, how to handle
rand!with views?
I did some stuff to get
randto work on all supported Integer types. Is this valid or does it affect the quality of random numbers?
So you're just reinterpreting Float32-generated random numbers as integers? That doesn't seem valid. Naively I would try to map the generated [0, 1) numbers to the integer's range, but that may still result in a biased output. I'm no RNG expert, but you could try it out and comparing a histogram to Random.RNG and see if it looks somewhat acceptable.
I'm not sure how to approach seeding. Any ideas?
What is the problem? Doesn't it have a way to seed the generator (e.g. https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixrandomphilox/3242872-initwithdevice?language=objc)?
- Float16 numbers are the only values that don't use MPS
- MPSMatrixRandom requires buffer offset to also be a multiple of 4
I would just fall back to the GPUArrays RNG in those cases. CUDA.jl similarly tries to CURAND when possible, and falls back to the native generator in other cases.
So you're just reinterpreting Float32-generated random numbers as integers?
The default rng generator generates random UInt32 values so it's reinterpreting those as whatever Integer value was determined. Just took a look at histograms for all Integer datatypes and they all seem to be equivalent to the CPU versions. They also seem to be much better than the GPUArrays.
Ex:
What is the problem?
I'm not sure how to tie it in with Metal.seed! to access the rng state and such.
I've marked this as a draft because the seeding test is currently marked as broken and I don't want to forget to fix it.
They also seem to be much better than the GPUArrays.
That's not surprising :-) FYI, if you really want to make sure the RNG is good, you can use https://github.com/JuliaRandom/RNGTest.jl. CUDA.jl's native RNG passes those, but it took a while to get there.
This is now ready for final review. ~It should not be squashed when merging because there are a few general improvements that are not directly related to the main focus of this PR. I made sure to have all the commits make sense on their own.~ Edit 2024-07-31: Squash away
Added some API functions that I missed. I won't squash anymore until it's been reviewed to make things less confusing.
I've fixed the easy to fix comments. I'll go over the rest over the weekend when I have more time.
~There seems to always be one random test that times out when running tests on 1.11. I can reliably reproduce on this branch on my device but I can't reproduce at all on main so it has to be from this PR (or this PR unmasked a bug?) but I have no idea where to start debugging this. I've seen it happen to linalg/norm, mps, metal, examples, and maybe others.~
See #329
I don't understand how the 1.9 failure could be related but it's the second time it happens. Any ideas?
That's a GPUCompiler issue, unrelated to this PR.
Interesting failure; seems unrelated to this PR though. I've restarted the job.
EDIT: that's weird; the metallib in there still contains trap + unreachable, even though it's using GPUCompiler.jl v0.27.3 which contains https://github.com/JuliaGPU/GPUCompiler.jl/pull/618...
EDIT 2: I think https://github.com/JuliaGPU/GPUCompiler.jl/pull/623 will fix this.
Once the unrelated failure is resolved, this should be ready to merge. The only thing left is that at the moment the docs mention that this is available starting with Metal 2.0, which I wrote when I thought #392 would cause a major version bump, but this doesn't have any breaking changes so it should probably be 1.4 instead.
Great stuff, thanks for keeping at it! We should include this in a blog post when releasing the next version of Metal.jl; I'll ping you when I get to that.
Thanks for taking the time to review this over and over!