oumi
oumi copied to clipboard
[Feature][Config] Add quantized Llama 405B inference + job config
Feature request
Possible quantized model to use: https://huggingface.co/neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8 May be able to fit on 8x A100 (40 or 80GB) on GCP.
Motivation / references
This is one of the best-performing open-weight models, and should be possible to host locally when quantized. Possible use cases include being an LLM judge.
Your contribution
N/A
Hi @wizeng23 ! I would like to work on this. Can you assign this issue to me?
Done! Appreciate the help with this :) please let me know if you have any questions about this!