llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: change model and lora from server api

Open stygmate opened this issue 1 year ago • 1 comments

Prerequisites

  • [X] I am running the latest code. Mention the version if possible as well.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Drive the loading of models from server api (like ollama) can be useful. More useful can be changing loras on the fly.

Motivation

Loras are small and switching between multiple finetunes can be very useful in complex application running on small computers.

Possible Implementation

No response

stygmate avatar May 30 '24 08:05 stygmate

I can support his use case, I want to be able to run multiple fairly rarely used models on the same hardware, but not being able to unload them quickly uses all available VRAM. If it would've been possible to specify the model to use in the API, that would've been a huge help.

perk11 avatar Jul 10 '24 07:07 perk11

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Aug 26 '24 01:08 github-actions[bot]