Masaki Kozuki issues

Results 42 issues of


Masaki Kozuki

Allow fused optimizer's `zero_grad` to call `_foreach_zero_`

now that by default `set_to_none` is `True` though, it'd be reasonable to let `fused` optimizers call `_foreach_zero_` instead of calling `.zero_` on every single parameter. The required change would be...

Change 1D Tensor of 1 element to 0D Tensor

add 0d tensor to graph adam/adamw test Affected: - `torch.cuda.amp.GradScaler`'s `found_inf`, `_scale`, and `_growth_tracker` - `step` of Adam & AdamW of `capturable` Fixes #96776 🤞 cc @vincentqb @jbschlosser @albanD @janeyx99...

module: optimizer

open source

module: cuda graphs

module: mta

release notes: nn

Update codegen for in-place foreach to return `List[Tensor]`

Fixes #104817 Examples of generated in-place foreach functions -- add and addcmul ```C++ ::std::vector _foreach_add__Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & scalar) { auto self_ = unpack(self, "self", 0); [[maybe_unused]]...

triaged

open source

release notes: foreach_frontend

[feature request] Skip git call in setup.py if submodules are there

It seems like we call `git submodule update --init --recursive` even when we update submodules beforehand as per https://github.com/pytorch/text/blob/60bea668f7bf4359a447487555b9209ae5b1e07b/setup.py#L48-L57 Would you mind if you share your take on following what...

Calendar Versioning

likewise NGC PyTorch containers.

Make distributed fused lamb test names friendly to keyword filtering

This is merely cosmetic as the current parametrization creates test case names such as `NcclDistributedFusedLAMB.test_distributed_fused_lamb_no_copy_True_opt_kwargs_{'overlap_reductions': False, 'dwu_num_blocks': 1, 'dwu_num_chunks': 1, 'fused_norm': True, 'fuse_scale': True, 'clip_after_ar': False}` whic is not ideally...

contrib

`apex.contrib.group_norm` would better have an import guard of `group_norm_cuda`

https://github.com/NVIDIA/apex/blob/50ac8425403b98147cbb66aea9a2a27dd3fe7673/apex/contrib/group_norm/group_norm.py#L21 Some contrib modules don't have one though e.g. https://github.com/NVIDIA/apex/blob/50ac8425403b98147cbb66aea9a2a27dd3fe7673/apex/contrib/layer_norm/layer_norm.py#L5 cc @ptrblck @xwang233

Use a modern tensor constructor in cudnn_gbn

https://github.com/NVIDIA/apex/blob/30a7ad3974b32f7ce68cefabc38374fb4520a35e/apex/contrib/cudnn_gbn/batch_norm.py#L90-L91

Instance norm nvfuser

patching #1309 for the sake of nvfuser refactoring done by https://github.com/pytorch/pytorch/pull/89621

Implement `no_sync` for `thunder.distributed.fsdp` (PR2457)

## tldr Enables `no_sync` for `thunder.jit(thunder.distributed.fsdp(model))`. The accompanied changes are: - new argument of `return_none_instead_of_grads` of `ThunderFunction.forward` - This could be eliminated once a `TraceCtx`'s bound symbols are not deleted...