Make profile_tasks variable thread local to prevent race condition.
The profile_tasksvariable is currently not thread local. This is in conflict with its use in drjit-core: There it is set within jitc_set_flags, which can be called from multiple threads. All the drjit-core flags are therefore thread local. This commit makes the profile_tasks variable thread local to be consistent with the way flags are handled in drjit-core.
Accessing TLS variables has a shockingly high cost on some OSes. I can look at this after getting back from vacation, but my gut feeling is that we don't want TLS reads in a perf-sensitive threading library.
I see, yes it's good to be cautious. We don't want to have a performance cost for this.
As you probably guessed, the reason I am fixing this is because our (latest) threading sanitizer caught the potential race condition in accessing this variable. I am always happy to push back fixes to the open source version, but only if they don't worsen the public code obviously :)