oumi icon indicating copy to clipboard operation
oumi copied to clipboard

Add Quantized Llama 405B Inference + Job Config

Open devampatel03 opened this issue 9 months ago • 1 comments

Description

This pull request addresses the feature request to add support for the quantized Llama 405B model for inference and job configuration. The changes include:

Configuration Updates:

  • Added a new function _create_llama_405b_quantized_config in supported_models.py to configure the Llama 405B model with quantization-specific parameters.

Job and Inference Configuration Files:

  • Created a new job configuration file 405b_quantized_gcp_job.yaml in evaluation to define the resources and setup for running the quantized Llama 405B model on GCP.
  • Created a new inference configuration file 405b_quantized_infer.yaml in inference to specify the model parameters and generation settings for the quantized Llama 405B model.

Related issues

Fixes #1391

Before submitting

  • [ ] This PR only changes documentation. (You can ignore the following checks in that case)
  • [x] Did you read the contributor guideline Pull Request guidelines?
  • [x] Did you link the issue(s) related to this PR in the section above?
  • [ ] Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

devampatel03 avatar Feb 24 '25 20:02 devampatel03

Hi Devam, thanks for the PR! I've converted it to a draft because it's missing a file. Once you add it back, could you please test-run evaluation and inference to confirm they work, then re-open this PR? Thanks!

wizeng23 avatar Feb 25 '25 20:02 wizeng23

Hi @devampatel03, could you confirm whether you're still working on this PR? If there aren't updates soon, I'll close this PR as clean-up, and you can continue to work on it on your fork.

wizeng23 avatar Mar 12 '25 18:03 wizeng23

Hi @wizeng23 , I want to confirm that I am still working on this PR. I have already implemented the suggested changes from the review , but currently I am figuring out how to test run evaluation and inference due to the lack of GCP credentials. Do we have any mock test method to test run it ?

devampatel03 avatar Mar 12 '25 19:03 devampatel03

Unfortunately a real run is required to verify that this works. It should be possible to create a GCP account with $300 free credit: https://cloud.google.com/free?utm_source=google&utm_medium=cpc&utm_campaign=na-US-all-en-dr-bkws-all-all-trial-e-dr-1710134&utm_content=text-ad-none-any-DEV_c-CRE_665665924741-ADGP_Hybrid+%7C+BKWS+-+MIX+%7C+Txt-Google+Cloud-Google+Cloud+Free-KWID_43700081235769791-kwd-394768718298&utm_term=KW_google+cloud+free+credits-ST_google+cloud+free+credits&gad_source=1&gclid=Cj0KCQjw4cS-BhDGARIsABg4_J3q_HTL8-cX82u1D7BZnx0fEpy_y-yZ_tHedCp8iCBCuVF5uS2Eys8aAru9EALw_wcB&gclsrc=aw.ds&hl=en

wizeng23 avatar Mar 12 '25 20:03 wizeng23

Hi Devam, do you have any updates on the testing?

wizeng23 avatar Apr 08 '25 06:04 wizeng23

Closing this PR due to inactivity. Feel free to make another one if you're still working on this.

wizeng23 avatar Apr 14 '25 18:04 wizeng23