Thomas Viehmann
Thomas Viehmann
But how does the lack of propagation actually create indeterminism here?
So I think we'd be dropping the caching you link after #1956 , hopefully?
Hint from the expert (thank you @tfogal): This can be avoided by using flash-attention.
isn't torch.Tensor.copy_ a legit method?
I for one would love to see a constant folding pass.
What is the trace when this happens? But we identified this as unclear behaviour, but I'm wondering if the .to is from the user code or from a decomposition.
I think this is pretty dubious. It starts with not capturing the side-effects of importing as caught by the CI, but probably also impacts typecheckers etc. If you have the...
>> If you have the - arguably somewhat special - need to import litgpt.config without importing litgpt, how about you add the litgpt path to PYTHONPATH and import config that...
> We don't just use litgpt.config, we also use litgpt.args (so we'd have to patch both). And unfortunately this solution would leave us with no longer having the nice typechecking...
Can we get a big picture here, please? How much are we looking to save and what role does this play for the discussion in #169 ? How does this...