julia icon indicating copy to clipboard operation
julia copied to clipboard

Add `jl_getaffinity` and `jl_setaffinity`

Open carstenbauer opened this issue 1 year ago • 13 comments

This PR adds two functions jl_getaffinity and jl_setaffinity to the runtime, which are slim wrappers around uv_thread_getaffinity and uv_thread_setaffinity and can be used to set the affinity of Julia threads.

This will

  • simplify thread pinning (ThreadPinning.jl currently pins threads by spawning tasks that run the necessary ccalls) and
  • enable users to also pin GC threads (or, more generally, all Julia threads).

Example:

bauerc@n2lcn0146 julia git:(cb/affinity)  
➜ ./julia -q --startup-file=no --threads 2,3 --gcthreads 4,1
julia> cpumasksize = @ccall uv_cpumask_size()::Cint
1024

julia> mask = zeros(Cchar, cpumasksize);

julia> jl_getaffinity(tid, mask, cpumasksize) = ccall(:jl_getaffinity, Int32, (Int16, Ptr{Cchar}, Int32), tid, mask, cpumasksize)
jl_getaffinity (generic function with 1 method)

julia> jl_getaffinity(1, mask, cpumasksize)
0

julia> print(mask[1:Sys.CPU_THREADS])
Int8[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

julia> mask[1] = 0;

julia> jl_setaffinity(tid, mask, cpumasksize) = ccall(:jl_setaffinity, Int32, (Int16, Ptr{Cchar}, Int32), tid, mask, cpumasksize)
jl_setaffinity (generic function with 1 method)

julia> jl_setaffinity(1, mask, cpumasksize)
0

julia> fill!(mask, 0);

julia> jl_getaffinity(1, mask, cpumasksize)
0

julia> print(mask[1:Sys.CPU_THREADS])
Int8[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

(cc @vchuravy, @gbaraldi)

Would be great to get this into 1.11 (despite feature freeze) because otherwise we won't be able to pin GC threads until 1.12 (likely not until the end of the year).

Closes #53073

carstenbauer avatar Feb 20 '24 11:02 carstenbauer

Open questions:

  • [x] Currently I use jl_atomic_load_acquire to query jl_n_threads and jl_atomic_load_relaxed to get jl_all_tls_states. Is this ok?
  • [ ] I put the declaration at the end of julia_threads.h. Is this fine?
  • [ ] I added the two new functions to jl_exported_funcs.inc. Is this how I should do it, or is this auto-generated somehow?
  • [x] To what extend do we test/document these functions?

carstenbauer avatar Feb 20 '24 11:02 carstenbauer

Added minimal tests and incorporated the changes suggested by @vchuravy.

carstenbauer avatar Feb 20 '24 18:02 carstenbauer

Adding backport label for 1.11 for the reasons mentioned above (should also be very straightforward to backport). Please chime in if you disagree.

carstenbauer avatar Feb 21 '24 07:02 carstenbauer

Windows failures appear related

Test Failed at C:\buildkite-agent\builds\win2k22-amdci6-1\julialang\julia-master\julia-a50eb050b7\share\julia\test\threads.jl:353
--
  | Expression: jl_setaffinity(1, mask, cpumasksize) == 0
  | Evaluated: -4071 == 0

IanButterworth avatar Mar 10 '24 03:03 IanButterworth

Strange. Looking at the libuv source code and https://docs.libuv.org/en/v1.x/errors.html, this return code seems to refer to "invalid argument". We pass 3 arguments, 1, mask, and cpumasksize. The latter is queried from libuv itself and thus should be safe. I think we can also rule out that 1 is the issue. That leaves us with mask, which is just mask = fill!(zeros(Cchar, cpumasksize), 1). Unfortunately, I don't have a Windows machine to test this live. On Linux, when I try to provide illegal masks, I don't get the error code we're seeing here:

julia> jl_setaffinity(1, fill!(zeros(Cchar, cpumasksize), 0), cpumasksize) # different error code if all mask entries are zero
-22

julia> jl_setaffinity(1, fill!(zeros(Cchar, cpumasksize+3), 1), cpumasksize) # increasing the mask size beyond cpumasksize doesn't seem to be a problem?
0

Of course, we could simply not test this on Windows but, in contrast to macOS, it should work there...

carstenbauer avatar Mar 11 '24 13:03 carstenbauer

Test are passing now (I choose a mask where only the first CPU thread is marked as 1, i.e. pinning to the first CPU thread).

However, depending on how the CI actually runs (does it have exclusive machine access?) I could imagine that the first CPU thread isn't always guaranteed to be available to the process..... Let me test the original mask (all ones) once again to see if the Windows CI failure was just a fluke. If it wasn't I think I would vote to simply disable this test on Windows.

carstenbauer avatar Mar 21 '24 11:03 carstenbauer

Alright same test that was passing above is now suddenly failing...

carstenbauer avatar Mar 22 '24 09:03 carstenbauer

@IanButterworth, given we're fine with not testing on Windows - I don't know why it fails and don't know how to reliably fix the test for it - this can be merged now.

While not ideal, I don't think it's too bad to not test this on Windows. After all, (1) this isn't documented API and (2) pinning threads typically is only really used on Linux (clusters) anyways. But I'm open to suggestions, of course.

carstenbauer avatar Mar 22 '24 12:03 carstenbauer

Seems this needs a rebase now. Also, I just wanted to double-check that merging this wouldn't introduce a flaky test?

IanButterworth avatar Mar 22 '24 13:03 IanButterworth

Bump (given this has merge me)

IanButterworth avatar Apr 15 '24 13:04 IanButterworth

@IanButterworth I've just rebased. Let's see if the test is flaky. (If it turns out to be problematic, maybe we shouldn't test this - internal feature - at all?)

carstenbauer avatar Apr 17 '24 10:04 carstenbauer

There's a test failure in 32-bit windows job in "threads" set, but on the phone I can't easily see the error message and tell if it's relevant.

giordano avatar Apr 20 '24 10:04 giordano

CI is now all green. @vtjnash @vchuravy

DilumAluthge avatar May 24 '24 23:05 DilumAluthge

This errors on FreeBSD, so CI jobs now all have:

Error in testset threads:
Test Failed at /usr/home/julia/.buildkite-agent/builds/freebsd13-amdci6-3/julialang/julia-master/julia-58927d885a/share/julia/test/threads.jl:366
  Expression: jl_setaffinity(1, mask, cpumasksize) == 0
   Evaluated: -22 == 0

ararslan avatar Jun 14 '24 04:06 ararslan