[Discussion][Umbrella] ModelAdapter Issues
Track some issues I have for ModelAdapter, some are big concerns, some are small, just for discussions so I can help with the project. Will update them based on the latest design.
- [ ] 1.
modelAdapterStatus.Instanceswill record all the Pod names as a list, will that lead to a explosion of the list length if we have lots of Pods - [ ] 2. we'll load/unload the adapter in the runtime of reconciliation via http requests, which might be too heavy for the controller, especially when thousands of adapters reconciling the same time, maybe agent?
- [x] 3. we'll validate the CRD in the controller, which should be delegated to the webhook, see separate issue: https://github.com/aibrix/aibrix/issues/710
- [ ] 4. we have a scheduler in the controller, which usually should be a separate component, but I think it's ok as a start
- [ ] 5. TODO: when removing modelAdapter, we only unload 0-index instance rather than the whole list
-
there're some basic assumptions on the usage. Basically, lora will serve "high density" use case, I won't expect lora to be scheduled across multiple instances for most of the time. If that case,
instanceslist won't be long. If lora has multiple replicas and they are hot. we should merge the loras. (in public proposal, there's a field calleddynamic merge, that's designed for this case) -
load/unload via http request is not that elegant but I am not aware of other means. Do you have suggestions? you mean hand it over to agent? and let agent to reconcile the object and send requests? I think that way sounds good. we are a little bit hesitate to introduce agent earlier. At that time, we consider to provide the host level agent to manage the model artifacts instead of ai runtime agent at this moment. We can consider to refactor this part. Let's have an offline discussion
-
Totally agree, it used to simplify the deployment and avoid the webhook.
-
We can have more discussion later. In our internal system, it plays multiples roles, including but not limited to scheduling, descheduling, rebalancer. We can consider to maintain a simplified version for lora.
-
multiple replicas are not supported yet. tracking issue https://github.com/aibrix/aibrix/issues/129