Brian Coutinho

Results 6 issues of Brian Coutinho

Add a few test cases to verify newly added NCCL metadata in profiler events The test looks at the following blocks record_param_comms ``` { "ph": "X", "cat": "cpu_op", "name": "record_param_comms",...

oncall: distributed
fb-exported
with-ssh

## Summary Currently the memory profiler feature in PyTorch is available via the [profiler API](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-memory-consumption) by passing `profile_memory=True` in the interface. It is desirable to also enable memory profiling using...

enhancement

Summary: There is an ODR one definition rule violation that was causing a crash on sigrid https://fb.workplace.com/groups/560979627394613/posts/2909061125919773/?comment_id=2909752389183980&reply_comment_id=2910102119149007 Sigrid includes both kineto and ipcfabric via dynolog, and on kineto the class...

CLA Signed

Related to an issue we saw on FAIR research cluster, some of the compille time flags were not set as expected. This change prints them so we can easily debug...

CLA Signed

## TLDR Dynolog provides system telemetry at Meta as well as in open source environments. Metric logging using Prometheus - an industry standard framework for logging/exporting metrics. This can also...

Fixes #125272 ## About (This is a re-spin of PR #106617) Kineto introduced a new profiler to read performance counters from NVIDIA GPUs (CUPTI Range Profiler API) added in PR[75616](https://github.com/pytorch/pytorch/pull/75616)....

Merged
Reverted
ciflow/binaries
ciflow/trunk
topic: not user facing
ciflow/binaries_conda
ciflow/binaries_wheel
ciflow/binaries_libtorch