oumi
oumi copied to clipboard
Add Quantized Llama 405B Inference + Job Config
Description
This pull request addresses the feature request to add support for the quantized Llama 405B model for inference and job configuration. The changes include:
Configuration Updates:
- Added a new function
_create_llama_405b_quantized_configinsupported_models.pyto configure the Llama 405B model with quantization-specific parameters.
Job and Inference Configuration Files:
- Created a new job configuration file
405b_quantized_gcp_job.yamlinevaluationto define the resources and setup for running the quantized Llama 405B model on GCP. - Created a new inference configuration file
405b_quantized_infer.yamlininferenceto specify the model parameters and generation settings for the quantized Llama 405B model.
Related issues
Fixes #1391
Before submitting
- [ ] This PR only changes documentation. (You can ignore the following checks in that case)
- [x] Did you read the contributor guideline Pull Request guidelines?
- [x] Did you link the issue(s) related to this PR in the section above?
- [ ] Did you add / update tests where needed?
Reviewers
At least one review from a member of oumi-ai/oumi-staff is required.
Hi Devam, thanks for the PR! I've converted it to a draft because it's missing a file. Once you add it back, could you please test-run evaluation and inference to confirm they work, then re-open this PR? Thanks!
Hi @devampatel03, could you confirm whether you're still working on this PR? If there aren't updates soon, I'll close this PR as clean-up, and you can continue to work on it on your fork.
Hi @wizeng23 , I want to confirm that I am still working on this PR. I have already implemented the suggested changes from the review , but currently I am figuring out how to test run evaluation and inference due to the lack of GCP credentials. Do we have any mock test method to test run it ?
Unfortunately a real run is required to verify that this works. It should be possible to create a GCP account with $300 free credit: https://cloud.google.com/free?utm_source=google&utm_medium=cpc&utm_campaign=na-US-all-en-dr-bkws-all-all-trial-e-dr-1710134&utm_content=text-ad-none-any-DEV_c-CRE_665665924741-ADGP_Hybrid+%7C+BKWS+-+MIX+%7C+Txt-Google+Cloud-Google+Cloud+Free-KWID_43700081235769791-kwd-394768718298&utm_term=KW_google+cloud+free+credits-ST_google+cloud+free+credits&gad_source=1&gclid=Cj0KCQjw4cS-BhDGARIsABg4_J3q_HTL8-cX82u1D7BZnx0fEpy_y-yZ_tHedCp8iCBCuVF5uS2Eys8aAru9EALw_wcB&gclsrc=aw.ds&hl=en
Hi Devam, do you have any updates on the testing?
Closing this PR due to inactivity. Feel free to make another one if you're still working on this.