Carlos Mocholí issues

Results 90 issues of


                                            Carlos Mocholí

Decide what to do about 16bit weights trained with mixed precision

Our training scripts select mixed precision by default (16-mixed or bf16-mixed). Many of the HF pretrained weights come in 16bit (float16 or bfloat16). Since the weights are already in this...

bug

QLoRA subcommand

qlora = lora with bnb and true precision. I think bf16-true is fine as a default because bitsandbytes doesnt' support the old cards well (like my laptop)

Add back support for longest sequence first

> @awaelchli Semi related to this PR. I just noticed that we don't have the code to run the longest sample at the beginning of training anymore: https://github.com/Lightning-AI/litgpt/blob/globals/finetune/lora.py#L268-L270 > Should...

enhancement

fine-tuning

Suggest pinned commits in config URLs

Our tutorials have suggestions like ```python litgpt finetune lora \ --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \ --lora_r 4 ``` But this has the drawback that it will stop working if: - The config...

enhancement

Support conversion to GGUF

`convert_lit_checkpoint.py` exports to gguf after the HF conversion. This can be `litgpt export ...`

Smart choice of the inference algorithm

`generate/base.py` and `generate/chat.py` (uses the former) assume that the model fits in memory. There's `generate/sequentially.py` and `generate/tp.py` that support using multiple devices. To streamline the experience, we could have the...

Add Cohere's Command-R

https://txt.cohere.com/command-r/ https://huggingface.co/CohereForAI/c4ai-command-r-v01 I don't think the architecture needs any changes to support this

checkpoints

Skip fx passes for split-cat with Node dims

Fixes this error: ```python File "/teamspace/studios/this_studio/lightning-thunder/thunder/executors/torch_compile.py", line 92, in compiled_func_wrapper return compiled_func(*args) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 410, in _fn return fn(*args, **kwargs) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 976, in catch_errors return callback(frame, cache_entry,...

oncall: distributed

module: cpu

triaged

module: mkldnn

open source

module: amp (automated mixed precision)

release notes: quantization

release notes: releng

oncall: pt2

module: inductor

module: dynamo

module: distributed_checkpoint

What's the best way to do backwards compatibility for existing configs?

If I have a CLI implementation (`before.py`) with a `Foo.a` argument ```python class Foo: def __init__(self, a=2): ... def fn(foo: Foo = Foo()): ... from jsonargparse import ArgumentParser, ActionConfigFile parser...

question

Drop the numpy requirement

### Outline & Motivation We barely use `numpy`. We should replace its uses with torch. Reduce the package footprint ### Pitch Remove https://github.com/Lightning-AI/lightning/blob/7bbbe22636853e1ee6fa57b557b3408d45233589/requirements/pytorch/base.txt#L4 Update numpy usages with torch. ### Additional...

help wanted

good first issue

refactor