Masaki Kozuki

Results 42 issues of Masaki Kozuki

now that by default `set_to_none` is `True` though, it'd be reasonable to let `fused` optimizers call `_foreach_zero_` instead of calling `.zero_` on every single parameter. The required change would be...

add 0d tensor to graph adam/adamw test Affected: - `torch.cuda.amp.GradScaler`'s `found_inf`, `_scale`, and `_growth_tracker` - `step` of Adam & AdamW of `capturable` Fixes #96776 🤞 cc @vincentqb @jbschlosser @albanD @janeyx99...

module: optimizer
open source
module: cuda graphs
module: mta
release notes: nn

Fixes #104817 Examples of generated in-place foreach functions -- add and addcmul ```C++ ::std::vector _foreach_add__Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & scalar) { auto self_ = unpack(self, "self", 0); [[maybe_unused]]...

triaged
open source
release notes: foreach_frontend

It seems like we call `git submodule update --init --recursive` even when we update submodules beforehand as per https://github.com/pytorch/text/blob/60bea668f7bf4359a447487555b9209ae5b1e07b/setup.py#L48-L57 Would you mind if you share your take on following what...

likewise NGC PyTorch containers.

This is merely cosmetic as the current parametrization creates test case names such as `NcclDistributedFusedLAMB.test_distributed_fused_lamb_no_copy_True_opt_kwargs_{'overlap_reductions': False, 'dwu_num_blocks': 1, 'dwu_num_chunks': 1, 'fused_norm': True, 'fuse_scale': True, 'clip_after_ar': False}` whic is not ideally...

contrib

https://github.com/NVIDIA/apex/blob/50ac8425403b98147cbb66aea9a2a27dd3fe7673/apex/contrib/group_norm/group_norm.py#L21 Some contrib modules don't have one though e.g. https://github.com/NVIDIA/apex/blob/50ac8425403b98147cbb66aea9a2a27dd3fe7673/apex/contrib/layer_norm/layer_norm.py#L5 cc @ptrblck @xwang233

https://github.com/NVIDIA/apex/blob/30a7ad3974b32f7ce68cefabc38374fb4520a35e/apex/contrib/cudnn_gbn/batch_norm.py#L90-L91

patching #1309 for the sake of nvfuser refactoring done by https://github.com/pytorch/pytorch/pull/89621

## tldr Enables `no_sync` for `thunder.jit(thunder.distributed.fsdp(model))`. The accompanied changes are: - new argument of `return_none_instead_of_grads` of `ThunderFunction.forward` - This could be eliminated once a `TraceCtx`'s bound symbols are not deleted...