litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Determine the default precision and quantization in chat and generate

Open awaelchli opened this issue 1 year ago • 1 comments

If you finetune a model with a certain quantization and precision setting, you still need to specify that in the chat and generate commands today:

litgpt chat \
    --checkpoint_dir out/qlora-codellama-13b/final \
    --precision bf16-true \
    --quantize bnb.nf4-dq

Otherwise you may get an OOM or different results that you were getting during training. Since we store the hyperparameters in a yaml file, we could select the two settings automatically if they are not specified:

# uses precision=bf16-true and quantize=bnb.nf4-dq from checkpoint folder
litgpt chat --checkpoint_dir out/qlora-codellama-13b/final

We already do this in other parts of LitGPT, so we could just reuse the utility function to read the two settings from the checkpoint.

awaelchli avatar Apr 04 '24 18:04 awaelchli

I don't see how we can tie this decision. The training and inference dtypes can be entirely different.

If it trains on 16-mixed, what would you say that it needs to use during inference? And if it trains on 16-true, inference already picks this by default.

For quantization it makes sense. Although I'm not sure that we should enable it silently

carmocca avatar Apr 04 '24 18:04 carmocca