GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

Need documentation for air-gapped (offline) on-prem deployment

Open Yu-amd opened this issue 10 months ago • 10 comments
trafficstars

OPEA should provide documentation and reference architecture on the mechanisms for storing and deploying applications along with all the dependencies (e.g., container images, Helm charts) or host model repositories locally.

Enterprises operating in secure environments need fully offline solution.

Yu-amd avatar Jan 16 '25 23:01 Yu-amd

I don't think documentation is enough.

Currently model downloading is done by each container separately when they start, and those services having write access to that volume. Meaning that user/admin may not even know whether node will run out disk before all those services are ready...

I think there should be a separate model downloader, that is used to pre-fetch all relevant models to the model volume, and that volume would be set as read-only afterwards. IMHO this should be how it's done (documented to be done) by default. Models being downloaded at run-time should be an exception.

eero-t avatar Jan 20 '25 13:01 eero-t

Helm charts are already using HF downloader in initContainers: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L53

There could be a separate script / container using that, which would download all specified models to a location expected by the services. Models could be specified either directly, or script could e.g. pick their names from the listed service specs / Helm charts.

PS. One more advantage of this would not needing to provide the secret HF token to all the inferencing services.

eero-t avatar Jan 20 '25 13:01 eero-t

With OPEA we do not prescribe use of a model nor vector database or even which version of OPEA release. For an air-gapped deployment, a priori all desired models and container images need to be specified to create a volume or S3 bucket that needs to be pulled down as part of the set up phase of the cluster and its workloads. So, this really calls for a tool to specify the models and images and create a volume or S3 bucket for the user. @Yu-amd would you please share some thoughts here.

mkbhanda avatar Feb 26 '25 02:02 mkbhanda

@Yu-amd @eero-t @mkbhanda thank a lot for raising the problem. Just remind if there is any update

yinghu5 avatar Mar 25 '25 01:03 yinghu5

@Yu-amd @eero-t @mkbhanda thank a lot for raising the problem. Just remind if there is any update

OIM could help in pre-downloading models, but its RFC does not specifically mention that use-case: https://github.com/opea-project/docs/pull/327

eero-t avatar Mar 25 '25 08:03 eero-t

Besides the model, there are many other OPEA microservices are downloading data from internet during runtime. We need to fix them 1 by 1: https://github.com/opea-project/GenAIComps/issues/1480

lianhao avatar Mar 31 '25 10:03 lianhao

Besides the model, there are many other OPEA microservices are downloading data from internet during runtime. We need to fix them 1 by 1: opea-project/GenAIComps#1480

@lianhao Thanks for investigating the issue and filing the individual tickets!

eero-t avatar Mar 31 '25 13:03 eero-t

Hi @Yu-amd, @lianhao has a RFC for air-gapped mode. Could you please help check if the design meets your requirements? Thanks.

joshuayao avatar Apr 23 '25 05:04 joshuayao

With OPEA we do not prescribe use of a model nor vector database or even which version of OPEA release. For an air-gapped deployment, a priori all desired models and container images need to be specified to create a volume or S3 bucket that needs to be pulled down as part of the set up phase of the cluster and its workloads. So, this really calls for a tool to specify the models and images and create a volume or S3 bucket for the user. @Yu-amd would you please share some thoughts here.

IMHO such tool should extract that information from the application deployment/config specs. If user needs to specify them manually, there's always a chance for misunderstandings & errors.

eero-t avatar Apr 23 '25 08:04 eero-t

Indeed need a tool that uses the application yaml or helm chart and values, to pull down all dependencies. The correct vector database, models, etc. May we assume that setting up cluster and device drivers, operators out of scope.

mkbhanda avatar Apr 23 '25 09:04 mkbhanda

The guide of components is in the component READMEs. Example guide is in the one-click deployment guide.

joshuayao avatar Aug 13 '25 01:08 joshuayao