Aaron Pham comments

Results 433 comments of


                                            Aaron Pham

trafficstars

How to fine-tune a base model and what does adapter id mean?

`--adapter-id` allows you to serve and merge multiple LoRA weights into the base model. We decided internally that having a simplified fine-tune API is probably not the best UX for...

feat: LoRA loading per request

This is already supported, you can pass it in via `--adapter-id` during startup, and then specify the `adapter_name` per request

feat: LoRA loading per request

But I haven't announced it yet, since I'm still testing with this I think the more important use case is for users to provide a remote adapter; what would that...

feat: LoRA loading per request

> Hey @aarnphm, are you asking for a way to load in an adapter per query for a given base model? For e.g., say you have facebook/opt-350m running and you're...

feat: LoRA loading per request

> Right, I gave that a try and it looks like that works. > > I guess what I was wondering is if it's possible to do the following: >...

feat: LoRA loading per request

The purpose of `adapter_name` per request here is that after you train a new lora layer, you can pass a remote URL to it and then it will just load...

feat: LoRA loading per request

I don't think local path would makes sense, since there is no way for the server to resolve it or import it into memory

feat: LoRA loading per request

you can provide the `adapter_name` via the swagger UI, under the same level as `prompt` key.

feat: LoRA loading per request

In terms of supporting path, maybe we can add support for s3. But initially we will probably support loading from huggingface hub for now

feat: LoRA loading per request

Oh I need to update how to load the lora layers. The endpoint `/v1/adapters` doesn't make sense because of broadcasting issue with multiple runners. Can you try with passing `adapter_name`...