julia icon indicating copy to clipboard operation
julia copied to clipboard

Remove typeinfer lock altogether

Open pchintalapudi opened this issue 3 years ago • 4 comments

This is the last section of code that refers to the old type inference lock, so now it gets its own mutex to itself.

pchintalapudi avatar Sep 18 '22 18:09 pchintalapudi

GPUCompiler was using these. Can you explain if and how we need to do locking from there now?

maleadt avatar Sep 18 '22 20:09 maleadt

If GPUCompiler is using the typeinf lock for type inference alone, it should no longer need to do that. If it was using it as a proxy for the codegen lock, there currently isn't a replacement, so I can just add the definitions back in (perhaps with a rename to make it more obvious?)

pchintalapudi avatar Sep 18 '22 20:09 pchintalapudi

We were also using this to lock codegen. But aren't the thread-safe context/modules enough for that?

maleadt avatar Sep 19 '22 14:09 maleadt

Once the LLVM IR has been generated, there should be no need for locks besides the context lock itself. I can't guarantee that all accesses to codeinst fields are safe outside a lock though.

pchintalapudi avatar Sep 19 '22 16:09 pchintalapudi

@maleadt Do you still want the lock methods present or should we merge this as-is? Hopefully in a couple of PRs the codegen lock will entirely dissipate away, so it would be good to identify any external dependencies on it.

pchintalapudi avatar Sep 21 '22 22:09 pchintalapudi

Do you still want the lock methods present or should we merge this as-is?

You tell me, I really don't know anymore at this point what we should and shouldn't lock in GPUCompiler.jl. FWIW, we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native). IIUC the former isn't needed anymore, and the latter takes an OrcThreadSafeModule.

Once the LLVM IR has been generated

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

maleadt avatar Sep 22 '22 13:09 maleadt

we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native)

I also don't think this is necessary anymore, but it might be safer to drop the lock synchronously with julia dropping the lock in codegen around these functions (#46836) rather than having to do it as part of this PR. So I've added the typeinf lock functions back to avoid breaking GPUCompiler here, though they're uncalled by base.

But that's jl_create_native, which takes an OrcThreadSafeModule, so isn't it using that lock?

Internally create_native will take the lock, but the lock doesn't need to be held while calling the function. Also, the lock is released before function return, and the context lock is sufficient to protect the LLVM IR from the return onwards.

pchintalapudi avatar Oct 04 '22 05:10 pchintalapudi

@vtjnash Could you review this PR again? I've added some code to deal with type inference recursion handling which should probably be backported to 1.9.

pchintalapudi avatar Nov 18 '22 00:11 pchintalapudi

But apparently this broke CI in many "interesting" ways however?

vtjnash avatar Nov 18 '22 19:11 vtjnash

But apparently this broke CI in many "interesting" ways however?

I think most of them are now fixed, but why is the analyzegc check newly failing? I don't see any particular reason that the changes would cause a value to become unrooted?

pchintalapudi avatar Nov 20 '22 19:11 pchintalapudi

It is a heuristic-driven pass, so unrelated changes can result in it exploring a portion of the execution graph that it previously skipped (or exploring it differently)

vtjnash avatar Nov 20 '22 22:11 vtjnash

It looks like this PR might have broken CI on master due to a "semantic conflict"? CI was all green on this PR, but when I look at https://github.com/JuliaLang/julia/commit/113efb6e0aa27879cb423ab323c0159911e4c5e7 on master, I see failures on i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-apple-darwin. On each platform, the failure is in the misc testset and looks something like this:

Error in testset misc:
--
  | Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-113efb6e0a/share/julia/test/misc.jl:355
  | Expression: after_comp >= before_comp
  | Evaluated: 0x6e2cbf1afdb64277 >= 0x9e9cb5ddee20d1af
  | ERROR: LoadError: Test run finished with errors

I have triggered a re-run of each of the failed jobs; let's see if the test failures persist on the re-run.

DilumAluthge avatar Nov 24 '22 04:11 DilumAluthge

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

brenhinkeller avatar Nov 25 '22 21:11 brenhinkeller

and we probably should not be breaking GPUCompiler willy nilly these days

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

@pchintalapudi does that sound correct?

jpsamaroo avatar Nov 25 '22 22:11 jpsamaroo

It looks like this PR might have broken CI on master due to a "semantic conflict"?

That failure looks like a real intermittent failure (I probably got the timer reporting logic wrong) which should be a quick fix.

Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI

At least for me, the GPUCompiler tests seem to be passing on master? Also, none of the changes that I expect would affect GPUCompiler here were included in the final PR.

If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed.

As per https://github.com/JuliaLang/julia/pull/46825#issuecomment-1266433034, if you're just accessing the LLVM IR then yes the context lock is sufficient, but if you're accessing the codeinst as well some of those fields might be modified unsafely. Incidentally, the context lock is probably also necessary in addition to being sufficient if the context is being provided by the Julia compiler.

pchintalapudi avatar Nov 25 '22 23:11 pchintalapudi

I've opened an issue for the CI failure: https://github.com/JuliaLang/julia/issues/47710

DilumAluthge avatar Nov 25 '22 23:11 DilumAluthge