pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

kubeflow pipeline add support of postgresql

Open yiyuanyu17 opened this issue 2 years ago • 21 comments

Feature Area

What feature would you like to see?

kubeflow pipeline add support of postgresql

What is the use case or pain point?

for some case , we can not use mysql for kubeflow pipeline , hope kubeflow pipeline can add the suppoort of postgresql

Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

yiyuanyu17 avatar Apr 02 '22 03:04 yiyuanyu17

Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?

zijianjoy avatar Apr 07 '22 22:04 zijianjoy

Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?

hello, we use kubeflow pipeline for AI model training in our platform. In the process of privatization delivery, some customers explicitly require that self built MySQL is not allowed, and the PostgreSQL provided by the customer side must be used. Therefore, our applications are modified into ORM framework to adapt to different database types. However, it is noted that the kubeflow pipeline server has not increased its support for PostgreSQL. Therefore, it proposes this issue and hopes to get the help of the community.

yiyuanyu17 avatar Apr 08 '22 12:04 yiyuanyu17

Thank you for the info, @yiyuanyu17 . I will keep this issue open so people can upvote if they are also interested in this postgresql support. People can create overlay which connects to postgresql but such support is not available in this repo yet.

zijianjoy avatar Apr 08 '22 18:04 zijianjoy

We would also be interested in this feature. We do a lot of on prem and disconnected/airgapped deployments. As such, Cloud Vendor hosted databases are not an option. In most scenarios it is easiest to run our own database clusters colocated on the same k8s environment as we run Kubeflow. The Crunchy Postgres experience on k8s is the best experience we've found to operate RDBMS clusters on k8s and we leverage it for other tooling. Would be nice to leverage it from Kubeflow as well, as operating MySql clusters on k8s is not as seamless an experience.

imiller445 avatar May 19 '22 01:05 imiller445

In our case we only have Postgres as an option for managed on prem DB. So looking for out fo the box Postgres support. @zijianjoy Can you please elaborate what creating overlay means, If that helps connect Kubeflow to postgres, I am interested to give it a try. Thanks!

vasireddyvkl avatar May 20 '22 15:05 vasireddyvkl

Hi @zijianjoy,

We also strictly use PostgreSQL internally, since it's better suited for data warehousing purposes.

RoyerRamirez avatar Jun 01 '22 22:06 RoyerRamirez

overlay is a kustomize concept as described in https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/. An overlay is a kubernetes resource package, it is like a variant of base KFP package.

Here is a list of KFP overlay: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env

If you look at platform-agnostic folder, you will find that it is depending on mysql: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party/mysql

So if you want to introduce postgresql, what you need to do is:

  1. Create a postgresql folder under https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party and define the postgresql resources in this folder.
  2. Define an overlay in https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env which allows people to use postgresql.

I would recommend testing this postgresql integration on your environment first before committing to KFP repo, because there is no guarantee/testing to verify KFP working with postgresql.

zijianjoy avatar Jun 04 '22 06:06 zijianjoy

It would be great if kubeflow pipeline support postgres!!! For some reason, our company also can not use MySQL. We strongly recommend the community to make the database optional @zijianjoy

javen218 avatar Jul 29 '22 10:07 javen218

It's a time consuming job for us every user to implement postgresql available for pipelines. So we're eagerly waiting for someone to contribute to it.

There are already pull requests implementing postgres for kubeflow katib (https://github.com/kubeflow/katib/pull/1921), I wander if there any plan about KFP SUPPORT PG?

javen218 avatar Aug 01 '22 06:08 javen218

MySQL is of no doubt an excellent database, however Oracle's acquisition brought uncertainty to its future. Like the others above, I sincerely hope kubeflow/pipeline can support postgresql soon, which is license friendly, and owns lots of advanced features.

Edward-liang avatar Aug 01 '22 07:08 Edward-liang

Also note that google/mlmd doesn't support Postgres yet: https://github.com/google/ml-metadata/issues/26

chensun avatar Aug 11 '22 23:08 chensun

As others have hinted towards here, PostgreSQL, especially with Operator Lifecycle Manager and, if wanted, being a Red-Hat-certified operator, is the way to go in an Enterprise environment that is Kubernetes-based. I wholeheartedly agree with all people who posted here. Database should not come pre-packaged with Kubeflow, as it is not a core component. Let people who really know their stuff handle things like database-ops and deployment, like e.g. Crunchy with PostgreSQL. And then use Postgres as a database for Kubeflow. Seriously, replication factor of 1, no pgbouncer proxy to improve load handling, no backup strategy .... https://github.com/kubeflow/katib/blob/9fce9dd03bc476b4e1f3d385e9692ac5cef681f4/manifests/v1beta1/components/postgres/postgres.yaml That cannot seriously be an approach by a project that has its origins with one of the big tech firms. Same goes for air-gapped functionality support with custom docker registries, HTTP_PROXY support via env variables and custom CA configmap for PKI trust.

shalberd avatar Aug 15 '22 05:08 shalberd

Currently we would like help from community to support PostgresQL integration. For anyone who wants to contribute making Kubeflow Pipelines runnable with PostgresSQL:

  1. ~Create a postgresql folder under https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party and define the postgresql resources in this folder.~ (Done)
  2. Implement PostgreSQL integration in KFP API server and cache server.
  3. Define an overlay in https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env which integrates KFP with postgresql
  4. ~Change MLMD's use of database as PostgreSQL: https://github.com/google/ml-metadata/issues/26~ (Done)

zijianjoy avatar Aug 23 '22 17:08 zijianjoy

Since we have #9813 to track this work, I'll close this issue. Please follow updates in that tracker issue

/close

rimolive avatar Mar 07 '24 16:03 rimolive

@rimolive: Closing this issue.

In response to this:

Since we have #9813 to track this work, I'll close this issue. Please follow updates in that tracker issue

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Mar 07 '24 16:03 google-oss-prow[bot]

@rimolive Sorry, since this work is not finished yet, the feature request bug is still valid. (Note: We use the upvote count of the original issue in order to track community's interest across the org, thus I am reopening this issue)

zijianjoy avatar Mar 07 '24 19:03 zijianjoy

May I suggest keeping track of MLMD's: https://github.com/google/ml-metadata/issues/194#issuecomment-1975207465 for this KFP-with-PostgreSQL scope?

Reason being, when MLMD is backed by PostgreSQL, there is allegedly a practical limits of only ~2K chars in MLMD string properties.

Potential solutions are mentioned (and one presented) with: https://github.com/google/ml-metadata/pull/195

hope this helps!

tarilabs avatar Mar 08 '24 07:03 tarilabs

Thanks @tarilabs for letting us know!

@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.

rimolive avatar Mar 08 '24 10:03 rimolive

Thanks @tarilabs for letting us know!

@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.

Added, however, please note that it is going to be an optional task in terms of postgresql integration with KFP, but a good item to contribute on.

zijianjoy avatar Mar 08 '24 21:03 zijianjoy

Agreed, the idea to add this issue is for tracking purposes.

rimolive avatar Mar 11 '24 18:03 rimolive

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 11 '24 07:05 github-actions[bot]