oumi [Feature][Config] Add quantized Llama 405B inference + job config

[Feature][Config] Add quantized Llama 405B inference + job config

Open wizeng23 opened this issue 9 months ago • 2 comments

Feature request

Possible quantized model to use: https://huggingface.co/neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8 May be able to fit on 8x A100 (40 or 80GB) on GCP.

Motivation / references

This is one of the best-performing open-weight models, and should be possible to host locally when quantized. Possible use cases include being an LLM judge.

Your contribution

N/A

Feb 06 '25 04:02 wizeng23

Hi @wizeng23 ! I would like to work on this. Can you assign this issue to me?

Feb 18 '25 20:02 devampatel03

Done! Appreciate the help with this :) please let me know if you have any questions about this!

Feb 19 '25 00:02 wizeng23

oumi oumi copied to clipboard

[Feature][Config] Add quantized Llama 405B inference + job config

Feature request

Motivation / references

Your contribution

oumi
oumi copied to clipboard