Pinle Liu
Pinle Liu
I recently implemented some of my own ideas on this project. I tried a clone3 on a leaf node switch, I started ping, I captured packets using tcpdump at the...
## Describe the Bug > A clear and concise description of what the bug is. https://github.com/mlcommons/chakra/wiki/Chakra-Execution-Trace-Collection-%E2%80%90-A-Comprehensive-Guide-on-Merging-PyTorch-and-Kineto-Traces There is something wrong with the tutorial sample code. Following the tutorial will cause...
> Please provide a detailed description of your question or the information you seek. I encountered the following warning while using chakra link: ``` [2025-06-04 04:00:01,109] trace_linker.py:679 [WARNING]: No CUDA...
this is my shell: ``` #!/bin/bash # Runs the "345M" parameter model export CUDA_DEVICE_MAX_CONNECTIONS=1 GPUS_PER_NODE=4 # Change for multinode config MASTER_ADDR=localhost MASTER_PORT=6000 NNODES=1 NODE_RANK=0 WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) CHECKPOINT_PATH=/mnt/Megatron-DeepSpeed/Models/gpt-2/checkpoint VOCAB_FILE=/mnt/Megatron-DeepSpeed/Models/gpt-2/data/gpt2-vocab.json MERGE_FILE=/mnt/Megatron-DeepSpeed/Models/gpt-2/data/gpt2-merges.txt DATA_PATH=/mnt/Megatron-DeepSpeed/Models/gpt-2/data/meg-gpt2_text_document PP_SIZE=2...