llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: Support for large reward-type models (Nemotron-4-340B-Reward)

Open tblattner opened this issue 6 months ago • 1 comments

Prerequisites

  • [X] I am running the latest code. Mention the version if possible as well.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Llama.cpp scalability has allowed groups to experiment with LLMs. NVIDIA released the Nemotron-4-340B-Reward model that alters the last layer to rate prompts based on 5 criteria: Helpfulness, Correctness, Coherence, Complexity, and Verbosity.

Having the ability to support easily updating the output logit processing would help support these types of changes in LLMs.

Motivation

Adding support to new types of models that target specific areas such as reward modeling.

Possible Implementation

I'm interested in pointers in how to implement such a feature, I would be happy to take a closer look to see how easy/hard such a feature would be to implement.

tblattner avatar Aug 27 '24 15:08 tblattner