add a way to add text to timing zones from julia
Right now, there is no way to add text to a zone from the julia defined zones (text being e.g. the yellow text in)
When we moved the require code into Julia we for example lost the name of the package being loaded due to this.
To add this functionality, we need a way to get the "current active block" for the timing. Here I used a ScopeValue for this (inspired by https://github.com/topolarity/Tracy.jl/pull/36). However, this means that we only get access to this functionality quite late, maybe we want to be able to add text from Compiler as well. So maybe we want some more basic way to do this. I thought about using a sentinel value but we don't have any "hard scopes" that we can define a local variable in. Then I thought about the outer @zone, looking into the expression and "injecting" its own block into any timer_print calls it can find. That may work. Other ideas are welcome.
Do you have a quick measurement for the overhead?
I'm guessing this makes our @zone allocating now, which is probably forgivable
Do you have a quick measurement for the overhead?
Doing some updates now, will measure afterwards. I'm pretty sure the current Tracy.jl implementation does not allocate though. Will check that one too.
Ok, I have three implementations now:
- A single global
RefValuewhich we set the current block and then reset it, useful for bootstrapping, perhaps when we don't have much other functionality - Storing a
RefValueinto TLS and updating / resetting it. Storing a RefValue seems better than a direct pointer because we can keep a reference to it and don't have to go via TLS again to reset it. - Using
ScopedValue.
Using the following benchmark;:
function f(print::Bool)
Base.@zone "1" begin
print && Base.timing_print("Hello")
for i in 1:100
Base.@zone "2" begin
print && Base.timing_print("a")
end
end
print && Base.timing_print("Dude!")
end
end
f(true)
using BenchmarkTools
@btime f(false)
@btime f(true)
I get:
- Single
RefValue
1.008 μs (0 allocations: 0 bytes)
3.146 μs (0 allocations: 0 bytes)
ScopedValue
2.963 μs (606 allocations: 15.78 KiB)
4.778 μs (810 allocations: 18.97 KiB)
- TLS +
RefValue
1.262 μs (0 allocations: 0 bytes)
1.688 μs (0 allocations: 0 bytes)
I don't understand how the TLS + RefValue is currently faster than the single global RefValue... I have to make some more checks but pushing up what I have.
I think ScopedValue is what we semantically want, but we would need to have https://github.com/JuliaLang/julia/issues/53584#issuecomment-1979089217 fixed, potentially by https://github.com/JuliaLang/julia/pull/55045
I think this would be good to have to restore some functionality we mused to have and allow for future customizations of the zones from julia. I'm happy for a review, but since this is not enabled by default, I'll merge in a couple of days.
I like the task_local_storage() implementation overall, and I think it'd be preferable not to have too many variants
I'm assuming that bootstrapping doesn't support using that implementation everywhere?
Thanks, I removed the ScopedValue code to make this a bit shorter and just had a reference to how that would be implemented to the Tracy PR.
Just had an idea.. Can we use jl_current_task->ptls->timing_stack->tracy_ctx to get the timing block?
It's already being hooked into the task context by _jl_timing_block_start
I can try it.
Isn't that wrong under task migration?
Isn't that wrong under task migration?
Can you explain what you think might go wrong?
Push something to the stack in one thread and try to pop it from another thread.
Push something to the stack in one thread and try to pop it from another thread.
Doesn't the ptls follow the migrated task?
No, ptls is per thread
Okay then that needs to be fixed, but it's orthogonal to this PR - Can you open an issue for it?
Yeah
@topolarity I tried to do this but I must be doing something wrong because I seem to only get null from the timing block:
https://github.com/JuliaLang/julia/commit/788b0e365977e5e65368c430829ac74938c1df63