dynolog
dynolog copied to clipboard
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also int...
Summary: - add ` std::vector traceIds` to `GpuProfilerResult` - update OSS dynolog to generate a unique (per trace) `trace_id` - the `trace_id` is generated as follows: `hash(hostname + pid +...
This issue was originally reported here: https://github.com/pytorch/pytorch/issues/132151 dynolog is currently distributed under the MIT license, however a GPL-v3 file is part of the third party tool 'cpr', see `third_party/cpr/test/LICENSE` Can...
Previously dynolog relied on a fact that `high_resolution_clock` is an alias of `system_clock`, however in general it is not true[1]. On systems where `high_resolution_clock` is an alias of `steady_clock` no...
The following sequence in dynologs fails to compile with [libc++](https://libcxx.llvm.org/) ("default C++ Standard Library implementation for many major platforms, including Apple’s macOS, iOS, watchOS, and tvOS, Google Search, the Android...
I'm currently trying to run inference profiling on a cuda kernel that's launched from pytorch. I am inside a docker container based which has cuda 12.5. I run: - dynolog...
Summary: The method MetricFrameVector::show() was not tested by the existing test suite. The diff adds tests covering 100.0% of the method's code. Differential Revision: D60142109
Hello dynolog maintainers, I've recently integrated the dynolog with Kubernetes (k8s) to create an on-demand profiling tool for GPU training clusters. This tool is designed to help us gain insights...
``` cat linear_model_example.py import math import torch import torch.profiler import torch.distributed as dist import os dist.init_process_group(backend='nccl') local_rank=int(os.environ['LOCAL_RANK']) rank=torch.distributed.get_rank() torch.cuda.set_device(local_rank) if not dist.is_available() or not dist.is_initialized(): print("dist init error") dtype =...
Summary: `-Wunused-exception-parameter` has identified an unused exception parameter. This diff removes it. This: ``` try { ... } catch (exception& e) { // no use of e } ``` should...
```bash $ dyno --hostname ip gputrace --logfile somefile.json --fail-on-no-process # Kineto config = # ACTIVITIES_LOG_FILE=somefile.json # ... # response = {"activityProfilersBusy":0,"activityProfilersTriggered":[],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[]} # No processes were matched, please check --job-id or...