examples issues

Fix random seed bug of DDP in the ImageNet example

1

the random seed has to be set in the main_worker, not in the `def main()`. I found that although the seed is set in `def main()`, each process in distributed...

LeeDoYup

distributed

About the problem of multi-node running stuck

1

My machine: 2 machines with different ips and 2 available Gpus on each machine When I use the multigpu_torchrun.py example, when I pass these two directives: `torchrun --nproc_per_node=2 --nnodes=2 --node_rank=0...

AntyRia

DDP training question

2

Hi, I'm using the tutorial [https://github.com/pytorch/tutorials/blob/master/intermediate_source/ddp_tutorial.rst](url) for DDP train,using 4 gpus in myself code, reference Basic Use Case. But when I finished the modification, it was stuck during run the...

Henryplay

help wanted

distributed

Argument parser does not recognise mps

1

Your issue may already be reported! Please search on the [issue tracker](https://github.com/pytorch/examples/issues) before creating one. ## Context I am running ``` python neural_style/neural_style.py train --dataset "path" --style-image "image-path" --save-model-dir "path"...

rociorey

compile errors in cpp code

3

## Context * Pytorch version: libtorch-macos-2.0.1.zip * Operating System and version: macos 13.4.1 (c) ## Your Environment * Installed using source? [yes/no]: no * Are you planning to deploy it...

AntonyM55

How to load Transformer model once using FSDP

## 📚 Documentation @HamidShojanazeri, I'm following your [FSDP example](https://github.com/pytorch/examples/tree/main/distributed/FSDP) and swapped in a bigger model, `google/flan-t5-xxl`, and am a little unclear on what happens when the script starts up. I'm...

ToddMorrill

Wrong Positional embedding in project examples/vision_transformer

the absolute position embedding used in [examples](https://github.com/pytorch/examples/tree/main)/[vision_transformer](https://github.com/pytorch/examples/tree/main/vision_transformer)/main.py seemed to be **incorrect**: ```python # Positional embedding self.pos_embedding = nn.Parameter(torch.randn(self.batch_size, 1, self.latent_size)).to(self.device) ``` which should look like this ```python self.pos_embedding = nn.Parameter(torch.randn(1,...

pb07210028

J

Nerorex10

`word_language_model` Different masking operation in two official tutorials

1

Hi, thanks for the great tutorial on language modeling. A question on [masking the input](https://github.com/pytorch/examples/blob/7f7c222b355abd19ba03a7d4ba90f1092973cdbc/word_language_model/model.py#L128): Why do we mask the input in the encoder layer? I'm aware that the mask...

ShengYun-Peng

Add logic to resume ImageNet example with new learning rate

3

Fixes #816 Resume no longer overwrites learning rate specific on command line. (My linter also removed 2 extraneous spaces in a `dict`, I can revert this if desired 🙂)

kiersten-stokes

cla signed

examples
examples copied to clipboard

Metadata

Fix random seed bug of DDP in the ImageNet example

About the problem of multi-node running stuck

DDP training question

Argument parser does not recognise mps

compile errors in cpp code

How to load Transformer model once using FSDP

Wrong Positional embedding in project examples/vision_transformer

J

`word_language_model` Different masking operation in two official tutorials

Add logic to resume ImageNet example with new learning rate

← Metadata

Owner

Metadata

examples examples copied to clipboard

Metadata

← Metadata

Owner

Metadata

examples
examples copied to clipboard