Parth Mannan
Parth Mannan
Torch does not seem to support torch._six anymore and it has been removed. Refer - https://github.com/pytorch/pytorch/pull/94709 DeepSpeed still has dependency on it. Example in `runtime/utils.py` ``` from torch._six import inf...
## 🚀 Feature The feature request is to add the decision making capabilities inside nvFuser executor that allows the nvFuser executor to reject/pass on certain op executions where other backends/executors...
## 🐛 Bug Running LLaMa2 13B with FSDP ZeRO2 on 8xH100 ``` torchrun --nproc_per_node=8 --nnodes=1 benchmark_litgpt.py --model_name Llama-2-13b-hf --compile thunder_cudnn --distributed_mode fsdp --shard_mode zero2 --bucketing_mode none --micro_batch_size 1 --global_batch_size 8...
## 🚀 Feature An environment variable that dumps out the various Thunder provided debug traces to a log file. This can have variable levels like `export THUNDER_DEBUG=` ``` 0/'' :...
## 🐛 Bug This is a lengthy issue/post detailing my observations with our distributed and bucketing performance. Some of these are actionable items and some are just observations to be...
## 🚀 Feature Request Currently we have computation traces with the generated tensor shapes as part of comments next to the computation like ``` t908 = torch.nn.functional.linear(t907, t19, t17) #...
# What does this PR do ? Added PR for main here - https://github.com/NVIDIA/Megatron-LM/pull/2282 Design document discussed in MCore sync meeting - https://docs.google.com/document/d/1MnIPQ_VbpDNp-adtvcEv-SYx6A8rtt3-fDdxbcdrmk0/edit?usp=sharing The first issue this MR is trying...