Carlos Mocholí

Results 427 comments of Carlos Mocholí

This is blocked by not being able to run two `optimize` calls together. Maybe we should have tutorials suggest `python -m litgpt.data.prepare_*` in the meantime for people who use this...

I don't see how we can tie this decision. The training and inference dtypes can be entirely different. If it trains on 16-mixed, what would you say that it needs...

I don't see why you would want anything other than "flops" if it fits in a single device. If it doesn't, you are forced to use one of the other...

The `sequentially.py` file could support it too if we want to. However, transformer inference at batch size 1 is already very latency bound so this would make it even worse....

You might want to merge OLMo with an interleaving conversion step because this PR is very risky and a breaking change for all existing checkpoints

@Andrei-Aksionov We need to evaluate if we want to make this change. Especially if there are any performance differences and whether the risk is worth it. But there are two...

The name is directly inherited from https://github.com/karpathy/nanoGPT/blob/master/model.py#L35. We took the liberty of dropping out the convolutional past "c_"

Hope he approves the PR then

Overall sounds good to me. This dataset is mainly for debugging. We could replace the "debug" config in https://github.com/Lightning-AI/litgpt/tree/wip/config_hub/pretrain with it. But it might be better to address https://github.com/Lightning-AI/litgpt/issues/1085 first