Carlos Mocholí

Results 427 comments of Carlos Mocholí

@Andrei-Aksionov Feel free to start this work! We won't have time to work on this for now. You might want to work on the lit-gpt repository instead, which also has...

We are focusing more on that project moving forward. It includes support for gpt-neox-derivative and llama-derivative weights.

> Is #devices here the number of GPUs per node or the total number (world_size)? Per node > I also checked the finetuning script and found that it's calculated differently:...

Thanks for reporting! I also noticed in #175. But I think it's safe to ignore. We should remove this flag and code in favor of a test comparing with HF,...

Can you provide more details about your hardware setup, python environment, and precise llama config? Thanks!

I'm a bit confused about your llama config, because it doesn't match the values in https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/model.py#L18-L25 But if you say another machine works, then there might be some environment or...

Linked issue in the lightning repo: https://github.com/Lightning-AI/lightning/issues/17172

@SinanAkkoyun Do you have access to H100s? If so, would you like to try out the PR https://github.com/Lightning-AI/lightning/pull/17597? It adds support to Fabric by passing `L.Fabric(precision="8-mixed")` You can install it...

Yes. Automatic replacement of the layers is missing but it's something that we want to do too. The parallelism customization would be left to the user to do though.

@SinanAkkoyun Did you run `generate.py`? How many tokens/sec do you get without it? How does the generation look? Can you `print(fabric.strategy.precision)` to make sure it's using fp8 precision? We might...