Thomas Viehmann comments

Results 227 comments of


                                            Thomas Viehmann

`requires_grad` attribute for cache key

But how does the lack of propagation actually create indeterminism here?

`requires_grad` attribute for cache key

So I think we'd be dropping the caching you link after #1956 , hopefully?

torch.Generator, in particular set_state

Hint from the expert (thank you @tfogal): This can be avoided by using flash-attention.

try removing `ltorch.copy_`

isn't torch.Tensor.copy_ a legit method?

Constant modeling for getitem

I for one would love to see a constant folding pass.

Phi3 masked_fill_ in-place support

What is the trace when this happens? But we identified this as unclear behaviour, but I'm wondering if the .to is from the user code or from a decomposition.

Moving to lazy root imports to make config loading snappy

I think this is pretty dubious. It starts with not capturing the side-effects of importing as caught by the CI, but probably also impacts typecheckers etc. If you have the...

Moving to lazy root imports to make config loading snappy

>> If you have the - arguably somewhat special - need to import litgpt.config without importing litgpt, how about you add the litgpt path to PYTHONPATH and import config that...

Moving to lazy root imports to make config loading snappy

> We don't just use litgpt.config, we also use litgpt.args (so we'd have to patch both). And unfortunately this solution would leave us with no longer having the nice typechecking...

Improve no_autocast overhead from 3.1 µs to 0.5 µs (6x improvement)

Can we get a big picture here, please? How much are we looking to save and what role does this play for the discussion in #169 ? How does this...