Yinghai Lu
Yinghai Lu
traced a bit more, I think the difference is when cpu_only=1, we hit this code path https://github.com/pytorch/kineto/blob/0cded49b0cce22f7d777861db5c937da7f3e22db/libkineto/src/init.cpp#L221 which triggers the `libkineto::api().initProfilerIfRegistered();` call. When cpu_only=0, supposingly we should hit this function...
these operations are all successful https://github.com/pytorch/kineto/blob/0cded49b0cce22f7d777861db5c937da7f3e22db/libkineto/src/init.cpp#L146-L156 I guess somehow the cupti `RESOURCE_CONTEXT_CREATED` hook is not reliable anymore. my cuda info ``` Driver Version: 550.144.03 CUDA Version: 12.4 ```
So yeah, the KINETO_DAEMON_INIT_DELAY_S env is actually not the root cause. I have patched the fix https://github.com/pytorch/pytorch/commit/c934ed65673428c35f70cbce6ebd730ad22f058d and get rid of KINETO_DAEMON_INIT_DELAY_S but the issue remains. I think all it...
`nvidia-smi` shows processed but it's possibly that I'm running in a container (https://github.com/NVIDIA/nvidia-docker/issues/179). Since scripts/pytorch/linear_model_example.py is running I think the context should be there already.
The part of dynamic attach in cupti: https://docs.nvidia.com/cupti/main/main.html#dynamic-attach-and-detach > CUPTI can be attached by calling any CUPTI API as CUPTI supports lazy initialization. so when we do something like `CuptiCallbackApi::initCallbackApi()`...
yeah exactly, I actually already got it working last night by doing ``` initProfilersCPU(); libkineto::api().configLoader().initBaseConfig(); ``` regardless of cpu_only. I think it is a legit change as - all the...
Yeah makes sense! Thanks a lot!
I created a separate issue in libkineto repo: https://github.com/pytorch/kineto/issues/1032