examples
examples copied to clipboard
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
## 📚 Documentation I believe the optimizer in this example should be declared after the parallelize module call, as in sequence parallelism. Without this, in latest torch, the example seems...
According to the implementation of the source code, I did several experiments to study the script running time and cuda memory occupancy. - exp1: nproc_per_node=4, nnodes=1 => cuda=2161~2411MB, runtime=63.04s -...
## Context While running the `distributed/FSDP/T5_training.py` example, I encountered an error when loading the `wikihow` dataset. I would like to know if this is a bug or if there is...
## Context * Pytorch version: 2.6.0+rocm6.2.4 * Operating System and version: Ubuntu 24.04.2 LTS x86_64 ## Your Environment * Installed using source? [yes/no]: no * Are you planning to deploy...
Updated arguments to match what main.py is looking for. Fixed incorrectly listed defaults, removed duplicate "--save-model", copied "--save_model" description from main.py.
I tried the language translation examples, there are several issues: 1.python compatable issues, I changed torch to 2.3.0 and torchtext to 0.18.0, otherwise it will not work on mac. 2....
I can't reproduce the issue of every process allocating memory of GPU 0 (https://github.com/pytorch/examples/issues/969), so maybe the underlying issue has been fixed. Regardless, usage of `torch.cuda.set_device` is [now discouraged](https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html) in...
Regression example was not updated recently and updated the script to make look similar to other examples.
I have tried the same example provided on [multigpu_torchrun.py](https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu_torchrun.py) and trained MNIST dataset and replaced the model with a simple CNN model. However, when increasing the number of GPUs in...
The transfer to the device was not consistent in the train and validate fn, so I just matched validate with train. It also reduced a couple of inconsistent calls and...