llama.cpp
llama.cpp copied to clipboard
Feature Request: Support for large reward-type models (Nemotron-4-340B-Reward)
Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the README.md.
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Llama.cpp scalability has allowed groups to experiment with LLMs. NVIDIA released the Nemotron-4-340B-Reward model that alters the last layer to rate prompts based on 5 criteria: Helpfulness, Correctness, Coherence, Complexity, and Verbosity.
Having the ability to support easily updating the output logit processing would help support these types of changes in LLMs.
Motivation
Adding support to new types of models that target specific areas such as reward modeling.
Possible Implementation
I'm interested in pointers in how to implement such a feature, I would be happy to take a closer look to see how easy/hard such a feature would be to implement.