Metal.jl icon indicating copy to clipboard operation
Metal.jl copied to clipboard

Add `MPSMatrixRandom`

Open christiangnrd opened this issue 1 year ago • 11 comments

Adds https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixrandom?language=objc and the other classes that start with "MPSMatrixRandom".

Code to integrate with Random.jl functions is present but commented out because it seems to be slower than the current rand! functionality from GPUArrays.

christiangnrd avatar Mar 26 '24 00:03 christiangnrd

Code to integrate with Random.jl functions is present but commented out because it seems to be slower than the current rand! functionality from GPUArrays.

The quality of random numbers produced by GPUArrays' RNG is probably lower though, so I would consider just using the one from MPS.

maleadt avatar Mar 27 '24 12:03 maleadt

A few comments.

  • I did some stuff to get rand to work on all supported Integer types. Is this valid or does it affect the quality of random numbers?
  • Every buffer is now created to a multiple of 16 bytes as the random number generation generates 32-bit values in multiples of 4. Is this acceptable? Is there an better alternative?
  • I'm not sure how to approach seeding. Any ideas?
  • Float16 numbers are the only values that don't use MPS. Should there be a warning somewhere mentioning this?
  • MPSMatrixRandom requires buffer offset to also be a multiple of 4, how to handle rand! with views?

christiangnrd avatar Mar 27 '24 20:03 christiangnrd

I did some stuff to get rand to work on all supported Integer types. Is this valid or does it affect the quality of random numbers?

So you're just reinterpreting Float32-generated random numbers as integers? That doesn't seem valid. Naively I would try to map the generated [0, 1) numbers to the integer's range, but that may still result in a biased output. I'm no RNG expert, but you could try it out and comparing a histogram to Random.RNG and see if it looks somewhat acceptable.

I'm not sure how to approach seeding. Any ideas?

What is the problem? Doesn't it have a way to seed the generator (e.g. https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixrandomphilox/3242872-initwithdevice?language=objc)?

  • Float16 numbers are the only values that don't use MPS
  • MPSMatrixRandom requires buffer offset to also be a multiple of 4

I would just fall back to the GPUArrays RNG in those cases. CUDA.jl similarly tries to CURAND when possible, and falls back to the native generator in other cases.

maleadt avatar Mar 29 '24 10:03 maleadt

So you're just reinterpreting Float32-generated random numbers as integers?

The default rng generator generates random UInt32 values so it's reinterpreting those as whatever Integer value was determined. Just took a look at histograms for all Integer datatypes and they all seem to be equivalent to the CPU versions. They also seem to be much better than the GPUArrays.

Ex: blah

What is the problem?

I'm not sure how to tie it in with Metal.seed! to access the rng state and such.

I've marked this as a draft because the seeding test is currently marked as broken and I don't want to forget to fix it.

christiangnrd avatar Mar 30 '24 04:03 christiangnrd

They also seem to be much better than the GPUArrays.

That's not surprising :-) FYI, if you really want to make sure the RNG is good, you can use https://github.com/JuliaRandom/RNGTest.jl. CUDA.jl's native RNG passes those, but it took a while to get there.

maleadt avatar Mar 30 '24 10:03 maleadt

This is now ready for final review. ~It should not be squashed when merging because there are a few general improvements that are not directly related to the main focus of this PR. I made sure to have all the commits make sense on their own.~ Edit 2024-07-31: Squash away

christiangnrd avatar Apr 01 '24 16:04 christiangnrd

Added some API functions that I missed. I won't squash anymore until it's been reviewed to make things less confusing.

christiangnrd avatar Apr 02 '24 15:04 christiangnrd

I've fixed the easy to fix comments. I'll go over the rest over the weekend when I have more time.

christiangnrd avatar Apr 03 '24 13:04 christiangnrd

~There seems to always be one random test that times out when running tests on 1.11. I can reliably reproduce on this branch on my device but I can't reproduce at all on main so it has to be from this PR (or this PR unmasked a bug?) but I have no idea where to start debugging this. I've seen it happen to linalg/norm, mps, metal, examples, and maybe others.~

See #329

christiangnrd avatar Apr 10 '24 21:04 christiangnrd

I don't understand how the 1.9 failure could be related but it's the second time it happens. Any ideas?

Build 947 Build 949

christiangnrd avatar Apr 16 '24 20:04 christiangnrd

That's a GPUCompiler issue, unrelated to this PR.

maleadt avatar Apr 17 '24 06:04 maleadt

Interesting failure; seems unrelated to this PR though. I've restarted the job.

EDIT: that's weird; the metallib in there still contains trap + unreachable, even though it's using GPUCompiler.jl v0.27.3 which contains https://github.com/JuliaGPU/GPUCompiler.jl/pull/618...

EDIT 2: I think https://github.com/JuliaGPU/GPUCompiler.jl/pull/623 will fix this.

maleadt avatar Aug 27 '24 18:08 maleadt

Once the unrelated failure is resolved, this should be ready to merge. The only thing left is that at the moment the docs mention that this is available starting with Metal 2.0, which I wrote when I thought #392 would cause a major version bump, but this doesn't have any breaking changes so it should probably be 1.4 instead.

christiangnrd avatar Aug 27 '24 18:08 christiangnrd

Great stuff, thanks for keeping at it! We should include this in a blog post when releasing the next version of Metal.jl; I'll ping you when I get to that.

maleadt avatar Aug 27 '24 20:08 maleadt

Thanks for taking the time to review this over and over!

christiangnrd avatar Aug 27 '24 20:08 christiangnrd