After conversion to LLVM we should be able to delete the inferred source of the kernel.
@simonbyrne has shown me a heap-snapshot were the inferred source took up >>1GB of ram.
Codecov Report
Patch coverage: 88.88% and project coverage change: -7.74% :warning:
Comparison is base (
edfdc1a) 83.18% compared to head (919242d) 75.44%. Report is 1 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #520 +/- ##
==========================================
- Coverage 83.18% 75.44% -7.74%
==========================================
Files 24 24
Lines 3300 3270 -30
==========================================
- Hits 2745 2467 -278
- Misses 555 803 +248
| Files Changed | Coverage Δ | |
|---|---|---|
| src/jlgen.jl | 77.85% <85.71%> (-2.07%) |
:arrow_down: |
| src/execution.jl | 67.79% <100.00%> (-32.21%) |
:arrow_down: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This doesn't seem to fix my issue. I'm not sure exactly where the problem is, but I did notice:
julia> GPUCompiler.GLOBAL_CI_CACHES
Dict{CompilerConfig, GPUCompiler.CodeCache} with 2 entries:
CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…
CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…
julia> Base.summarysize(GPUCompiler.GLOBAL_CI_CACHES) / 10^6
1396.946174
julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[1]) / 10^6
1393.855007
julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[2]) / 10^6
3.090233
I tried manually calling empty! on this dict: it didn't seem to make any difference, so I suspect the data is being retaine somewhere else as well.
Also, what's odd is that RES reported by top is 6.3g, but
julia> Sys.maxrss() / 10^9
17.232601088
Removed a call to jl_uncompress_ir, as IIRC it was only needed for the 1.6 overlay hack: https://github.com/JuliaGPU/GPUCompiler.jl/pull/151#issuecomment-779687366 Maybe that also helps?
Unfortunately still no.
You could try taking a heap snapshot.
I did that: it looks like most of it is still the inferred objects:
I tried clearing them out manually:
for cache in values(GPUCompiler.GLOBAL_CI_CACHES)
for insts in values(cache.dict)
for inst in insts
@atomic :release inst.inferred = nothing
end
end
end
that seemed to work:
top is still reporting 4GB of memory usage though, so not sure what is going on.
So I am only deleting top-level kernel calls. Since everything else is re-usable.
@maleadt are we tracking anywhere how big the modules are we load onto the GPU?
@maleadt are we tracking anywhere how big the modules are we load onto the GPU?
No, and I don't know of a way to query the size of a CuModule or CuContext.