Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Model orchestration with heterogeneous hardwares

related paper: https://arxiv.org/abs/2404.14527

Model orchestration with heterogeneous hardwares

We do not have plan in v0.2.0 to change the orchestration part. Let's firstly resolve the cost-efficient serving issue using multiple deployment with some common labels, that's enough. I will...

Model orchestration with heterogeneous hardwares

this is a sub-story of #425, we may use a lose way like labels to orchestrate the workload in v0.2.0. We can better orchestrate such workloads in v0.3.0 with model...

[CI] Generate helm package from kubebuilder manifests

/cc @M00nF1sh are you aware of any tools to covert kustomize to helm package? We do not want to maintain the helm separately

redis is not that stable and quit from SIGTERM

same here. it only happens on lambda instance + nvkind

[CI] Generate helm package from kubebuilder manifests

@M00nF1sh Can we add the helm repo first? Let's have a short discussion on the kustomize manifests maintenance later.

[CI] Generate helm package from kubebuilder manifests

![image](https://github.com/user-attachments/assets/324afb75-966c-4acb-b303-0c5c270a5b34) kustomization and generate yaml should be good enough for v0.1.0 release. Helm package support can be postponed to v0.2.0

redis is not that stable and quit from SIGTERM

![Image](https://github.com/user-attachments/assets/dd99bb99-2982-438c-804b-9daee888060f) the problem still exist.

redis is not that stable and quit from SIGTERM

Actually most of the containers crashed. metadata-service ![Image](https://github.com/user-attachments/assets/48384335-18f4-47df-b468-97a47eba09c1) ![Image](https://github.com/user-attachments/assets/64391a8f-125d-4794-8243-d1f476cede24) gpu-optimizer ![Image](https://github.com/user-attachments/assets/b085a396-f220-456c-b1b3-380b1d7fc079) ![Image](https://github.com/user-attachments/assets/894e2bf5-4d60-47b3-9204-bf9f4e18c0d8) gateway-plugin ![Image](https://github.com/user-attachments/assets/625a4680-30a4-484d-9efb-5419cf7890eb) ![Image](https://github.com/user-attachments/assets/9d2c1893-79bb-409a-b610-d199a0b91351) redis-master ![Image](https://github.com/user-attachments/assets/50f58529-18fb-4c75-ada0-45d2ff708b01) ![Image](https://github.com/user-attachments/assets/919a5a26-50b1-43b6-84f8-d4aa84186f78) controller-manager ![Image](https://github.com/user-attachments/assets/cc5ef79c-3c0d-4b97-8445-dd9ff66ff0a6) ![Image](https://github.com/user-attachments/assets/eaf47ca9-f08b-4b35-a0d4-062d65d64912)

redis is not that stable and quit from SIGTERM

three categories - solid softwares like redis/controller/gateway-plugin, exitCode is 0. they all have error handling - our own written compinents, like gpu-optimizer, metadata service shows other error codes. - kuberay...