oumi icon indicating copy to clipboard operation
oumi copied to clipboard

[Feature][Config] Add quantized Llama 405B inference + job config

Open wizeng23 opened this issue 9 months ago • 2 comments

Feature request

Possible quantized model to use: https://huggingface.co/neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8 May be able to fit on 8x A100 (40 or 80GB) on GCP.

Motivation / references

This is one of the best-performing open-weight models, and should be possible to host locally when quantized. Possible use cases include being an LLM judge.

Your contribution

N/A

wizeng23 avatar Feb 06 '25 04:02 wizeng23

Hi @wizeng23 ! I would like to work on this. Can you assign this issue to me?

devampatel03 avatar Feb 18 '25 20:02 devampatel03

Done! Appreciate the help with this :) please let me know if you have any questions about this!

wizeng23 avatar Feb 19 '25 00:02 wizeng23