GenAIExamples
GenAIExamples copied to clipboard
Need documentation for air-gapped (offline) on-prem deployment
OPEA should provide documentation and reference architecture on the mechanisms for storing and deploying applications along with all the dependencies (e.g., container images, Helm charts) or host model repositories locally.
Enterprises operating in secure environments need fully offline solution.
I don't think documentation is enough.
Currently model downloading is done by each container separately when they start, and those services having write access to that volume. Meaning that user/admin may not even know whether node will run out disk before all those services are ready...
I think there should be a separate model downloader, that is used to pre-fetch all relevant models to the model volume, and that volume would be set as read-only afterwards. IMHO this should be how it's done (documented to be done) by default. Models being downloaded at run-time should be an exception.
Helm charts are already using HF downloader in initContainers: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L53
There could be a separate script / container using that, which would download all specified models to a location expected by the services. Models could be specified either directly, or script could e.g. pick their names from the listed service specs / Helm charts.
PS. One more advantage of this would not needing to provide the secret HF token to all the inferencing services.
With OPEA we do not prescribe use of a model nor vector database or even which version of OPEA release. For an air-gapped deployment, a priori all desired models and container images need to be specified to create a volume or S3 bucket that needs to be pulled down as part of the set up phase of the cluster and its workloads. So, this really calls for a tool to specify the models and images and create a volume or S3 bucket for the user. @Yu-amd would you please share some thoughts here.
@Yu-amd @eero-t @mkbhanda thank a lot for raising the problem. Just remind if there is any update
@Yu-amd @eero-t @mkbhanda thank a lot for raising the problem. Just remind if there is any update
OIM could help in pre-downloading models, but its RFC does not specifically mention that use-case: https://github.com/opea-project/docs/pull/327
Besides the model, there are many other OPEA microservices are downloading data from internet during runtime. We need to fix them 1 by 1: https://github.com/opea-project/GenAIComps/issues/1480
Besides the model, there are many other OPEA microservices are downloading data from internet during runtime. We need to fix them 1 by 1: opea-project/GenAIComps#1480
@lianhao Thanks for investigating the issue and filing the individual tickets!
Hi @Yu-amd, @lianhao has a RFC for air-gapped mode. Could you please help check if the design meets your requirements? Thanks.
With OPEA we do not prescribe use of a model nor vector database or even which version of OPEA release. For an air-gapped deployment, a priori all desired models and container images need to be specified to create a volume or S3 bucket that needs to be pulled down as part of the set up phase of the cluster and its workloads. So, this really calls for a tool to specify the models and images and create a volume or S3 bucket for the user. @Yu-amd would you please share some thoughts here.
IMHO such tool should extract that information from the application deployment/config specs. If user needs to specify them manually, there's always a chance for misunderstandings & errors.
Indeed need a tool that uses the application yaml or helm chart and values, to pull down all dependencies. The correct vector database, models, etc. May we assume that setting up cluster and device drivers, operators out of scope.
The guide of components is in the component READMEs. Example guide is in the one-click deployment guide.