Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

[WIP] Simplified preparation of pretraining datasets

This is blocked by not being able to run two `optimize` calls together. Maybe we should have tutorials suggest `python -m litgpt.data.prepare_*` in the meantime for people who use this...

Determine the default precision and quantization in chat and generate

I don't see how we can tie this decision. The training and inference dtypes can be entirely different. If it trains on 16-mixed, what would you say that it needs...

Error in "_merge_no_wait": The config isn't consistent between chunks. This shouldn't have happened.

cc @tchaton or @awaelchli

Smart choice of the inference algorithm

I don't see why you would want anything other than "flops" if it fits in a single device. If it doesn't, you are forced to use one of the other...

Smart choice of the inference algorithm

The `sequentially.py` file could support it too if we want to. However, transformer inference at batch size 1 is already very latency bound so this would make it even worse....

Drop interleave placement in QKV matrix

You might want to merge OLMo with an interleaving conversion step because this PR is very risky and a breaking change for all existing checkpoints

Drop interleave placement in QKV matrix

@Andrei-Aksionov We need to evaluate if we want to make this change. Especially if there are any performance differences and whether the risk is worth it. But there are two...

Drop interleave placement in QKV matrix

The name is directly inherited from https://github.com/karpathy/nanoGPT/blob/master/model.py#L35. We took the liberty of dropping out the convolutional past "c_"

Drop interleave placement in QKV matrix

Hope he approves the PR then

Add TinyStories to the pretraining docs

Overall sounds good to me. This dataset is mainly for debugging. We could replace the "debug" config in https://github.com/Lightning-AI/litgpt/tree/wip/config_hub/pretrain with it. But it might be better to address https://github.com/Lightning-AI/litgpt/issues/1085 first