Carlos Mocholí issues

Results 90 issues of


                                            Carlos Mocholí

A mechanism for easily calling a module's traces

## 🚀 Feature ### Motivation When debugging programs, you can easily get a modules trace with `thunder.last_traces(compiled_model)` However, calling it is not simple because traces can be thousand of lines...

[FSDP] Support gradient clipping by norm

## 🚀 Feature ### Pitch Port https://github.com/pytorch/pytorch/blob/c4a157086482899f0640d03292e5d2c9a6a3db68/torch/distributed/fsdp/fully_sharded_data_parallel.py#L1069-L1194 to work with Thunder's FSDP. This could be importable through `from thunder.distributed.utils import clip_grad_norm_`. We could also move FSDP into `thunder.distributed.fsdp` and put...

[FSDP] Support optimizer state checkpointing

## 🚀 Feature ### Motivation Saving the optimizer state is critical to resume a training run. ### Pitch ```python from thunder.distributed.checkpoint import get_optimizer_state_dict, load_optimizer_state_dict ``` from https://github.com/Lightning-AI/lightning-thunder/blob/main/thunder/distributed/checkpoint.py Then, integrate it...

Support `torch.Tensor.register_hook`

## 🚀 Feature ### Motivation In Lightning Fabric, we use this once for error checking that the user properly called `backward`. https://github.com/Lightning-AI/pytorch-lightning/blob/096b063d6eeb41567409f4a6b9bac6f5af28ed93/src/lightning/fabric/wrappers.py#L232-L233. cc @awaelchli I don't expect that we run...

Debug mode that associates backward ops to forward ops in the trace

## 🚀 Feature ### Motivation People that are not familiar with the autograd definitions for their models can have a hard time inspecting the `backward` trace generated by Thunder because...

Support combinations of precision plugins

### Description & Motivation Both the Fabric and Trainer strategies are designed to have a single plugin enabled from the beginning to the end of the program. This has been...

feature

design

fabric

plugin

Annotating parameters with `tl.pointer_type` results in KeyError

Annotating parameters with `foo: tl.pointer_type` results in the following stacktrace ```python my_model.py:90: in forward my_kernel[lambda meta: (triton.cdiv(numel, meta["BLOCK_SIZE"]),)]( ../../../../.local/lib/python3.10/site-packages/triton/runtime/jit.py:345: in return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) ../../../../.local/lib/python3.10/site-packages/triton/runtime/autotuner.py:156: in...

Support `str(datamodule)`

## 🚀 Feature Add support ```python print(str(MyDataModule())) ``` ### Motivation It currently prints: ```python ``` ### Pitch It could print the DataLoader structure: ```python MyDataModule( train_dataloader: {"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2),...

feature

good first issue

data handling

profile with modules and stack

I find these two arguments very helpful, maybe others do too.

CLA Signed

How do I skip an argument of a callable?

In this example ```python from jsonargparse import ArgumentParser, CLI import torch from typing import Callable from dataclasses import dataclass @dataclass class Foo: opt: Callable[[torch.nn.Parameter], torch.optim.Optimizer] parser = ArgumentParser() parser.add_class_arguments(Foo) args...

enhancement