Thomas Viehmann
Thomas Viehmann
Disabling for now. ``` FAILED thunder/tests/test_grad.py::test_populate_grads_csa_torch_cuda_thunder.dtypes.float32 - AssertionError: Tensor-likes are not close! Mismatched elements: 218295 / 589824 (37.0%) Greatest absolute difference: 1030.24462890625 at index (473, 722) (up to 0.01 allowed)...
Running (with lates litgpt to get Llama 3.2, but you can also use llama2-like) ```python with torch.device('cuda'): m = litgpt.GPT.from_name('Llama-3.2-1B').bfloat16().requires_grad_(False) m.set_kv_cache(1) inp1 = torch.ones(1, 16, device="cuda", dtype=torch.int32) inp_pos1 = torch.arange(16,...
There currently is a lot of overlap with args and it's a bit annoying to have to care about siginfo whenever constructing an intermediate trace. So maybe it would be...
We currently do not seem to preserve the traceback when we have context manager `__exit__` cleanups, we should. We should probably also remove all purely interpreter-internal stack frames that sit...
...to provide a better error message. Most likely the easiest is to add something like ``` if get_jit_ctx() is not None: raise NotImplementedError("Re-entrant jitting is not supported") ``` to the...
Maybe first step: ``` def step(m, inp): return m(inp) jm = jit(model) jfn = jit(fn) jfn(jm, inp) ``` It would be cool if this worked: ``` def step(m, opt, inps):...
Having written a few "new-style" transforms between a few people, we might compile a guide: - intervention points for transforms in the thunder.jit flow, - expected trace properties and how...
Currently, if a symbol does not have an implementation, it is silently dropped in transform_for_execution. This was one factor of making it hard to debug #1166 and is quite a...
The old codepath is not composable with other transforms, does not offer gathering of state dicts as easily etc. Removing, of course depends on NVIDIA benchmarking not needing it. I...
The autocast and gradient-related transforms use interpret_trace->transform->construct_trace this drops information, e.g. tags on the proxies. So this issue is making them work with the traces directly. My idea here is...