pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

Tracking: PostgreSQL support in KFP

Open zijianjoy opened this issue 2 years ago • 19 comments

PostgreSQL request has become the top upvoted issue on KFP repo: https://github.com/kubeflow/pipelines/issues/7512. This issue is for tracking the work of this integration.

  • [ ] KFP Backend integration

    • [x] Define the DB config for PostgreSQL DB info
      • [x] Basic connection config
      • [ ] Secure connection config: passfile, hostaddr, ssl mode, certificate, etc.
      • [x] Flag to switch between Postgresql driver and MySQL driver
    • [x] ~~#9859~~
    • [ ] Syntax support of PostgreSQL DB in backend
      • [ ] Adopt a different syntax during initialization. (schema difference)
      • [ ] Adopt a different syntax during execution. (Modification of data)
      • [ ] Explore the dialect difference and develop a control mechanism to enable easy testing on the storage layer.
    • [ ] Testing
      • [ ] Unit testing for Postgresql behavior.
      • [ ] Functional testing for Postgresql behavior.
      • [ ] E2E testing for Postgresql behavior.
    • [ ] Cache server integration
    • [ ] KFP API server integration
    • [ ] Manifest support for Postgresql
      • [x] #9860
      • [x] #9861
      • [ ] Marketplace KFP (managed CloudSQL instance)
  • [ ] MLMD integration

    • [ ] Develop Postgresql integration on MLMD
      • [ ] Investigate https://github.com/google/ml-metadata/issues/194#issuecomment-1975207465
    • [ ] Release new version of MLMD
    • [x] #9848
    • [ ] Manifest support for Postgresql in MLMD
      • [ ] Standalone KFP
      • [ ] Full Kubeflow Kubeflow
      • [ ] Marketplace KFP (managed CloudSQL instance)

zijianjoy avatar Aug 03 '23 20:08 zijianjoy

cc @chensun

zijianjoy avatar Aug 03 '23 20:08 zijianjoy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 27 '23 07:11 github-actions[bot]

Hi! I am Eshaan Aggarwal, an avid Open Source enthusiast from India. I am a web developer proficient in GoLang and PostgreSQL and have recently started learning about Kubernetes. I would love to contribute to this issue and hopefully join Kubeflow as a GSoC '24 mentee. Are there any pre-tests or other beginner-friendly contributions I can make to get acquainted with this project and as a proof of skill?

EshaanAgg avatar Feb 22 '24 14:02 EshaanAgg

Hello, I'm Udit. Proficient in Python, Golang, and PostgreSQL, I've recently completed a comprehensive course on Kubernetes. The skills acquired perfectly align with the requirements of this project, making it an ideal platform for me to apply and further enhance my knowledge. I'm enthusiastic about contributing to this GSoC project, especially on the specified issue. Eagerly anticipating the chance to contribute to its development!

UditNayak avatar Feb 26 '24 07:02 UditNayak

Hello @EshaanAgg and @UditNayak , thank you for your interest and I am assuming @rimolive will be your mentor.

As a start, I would recommend learning:

  • PostgreSQL syntax
  • GORM which is a database syntax abstraction library: https://gorm.io/index.html
  • Have a kubernetes environment yourself for development.

Then, my advice to the development will be in following orders:

  1. Make sure you can bring up KFP in the kubernetes environment
  2. Make sure you can bring up a postgresql instance and access to it manually in the kubernetes environment
  3. Make changes in KFP API server so it can read postgresql connection config from parameter/envionrment-variable. Then KFP API server should establish connection with the postgresql instance in the same cluster with such connection config.
  4. Make corresponding GORM change so that CRUD (create/read/update/delete) operation of KFP can be executed correctly using postgresql. (It is a good time to write some unit test or E2E test)
  5. Perform the similar actions as above for cache server.

I believe @rimolive can facilitate more once you dive deep into the project. But feel free to take any task you want to work on and ask questions along the way. Have fun!

zijianjoy avatar Mar 07 '24 21:03 zijianjoy

In addition to what @zijianjoy said, please join us on our Slack. We have the #gsoc-participants channel to welcome everyone interested in the GSoC projects.

rimolive avatar Mar 11 '24 18:03 rimolive

@zijianjoy I am a web developer from India, and I would like to work on this issue. I know python,golang and a bit of kubernetes. How to get started on this

VDliveson avatar Mar 14 '24 07:03 VDliveson

@zijianjoy Im currently doing ops in kubeflow, loved the concept of changing db, Will be working on these.

#Make sure you can bring up KFP in the kubernetes environment #Make sure you can bring up a postgresql instance and access to it manually in the kubernetes environment #Make changes in KFP API server so it can read postgresql connection config from parameter/envionrment-variable. Then KFP API #server should establish connection with the postgresql instance in the same cluster with such connection config. #Make corresponding GORM change so that CRUD (create/read/update/delete) operation of KFP can be executed correctly using postgresql. (It is a good time to write some unit test or E2E test) #Perform the similar actions as above for cache server.

Irshu786 avatar Mar 19 '24 01:03 Irshu786

@zijianjoy I am a BE developer from India and am comfortable with python, goLang, java, Kubernetes and SQL databases. I would like to contribute to this issue. What are the steps?

SnehaAgg0212 avatar Mar 20 '24 14:03 SnehaAgg0212

hello@zijianjoy I'm a junior studen from China major in computer science and technology, I'm very interested in open source project and want to do some contribution to this project, at the same time to improve my skill. I have joined our school lab which connect with database and cloud. So I frequently contact with postgresql and kubernetes. I'm doing the steps you refered above,but I can't join in the #gsoc-participants,can this have influence?

jiduyuting avatar Mar 25 '24 03:03 jiduyuting

Hey Guys,

Everyone has setup local kubeflow setup ?

On Mon, Mar 25, 2024 at 8:50 AM Jiduyuting @.***> wrote:

@.*** I'm a junior studen from China major in computer science and technology, I'm very interested in open source project and want to do some contribution to this project, at the same time to improve my skill. I have joined our school lab which connect with database and cloud. So I frequently contact with postgresql and kubernetes. I'm doing the steps you refered above,but I can't join in the #gsoc-participants,can this have influence?

— Reply to this email directly, view it on GitHub https://github.com/kubeflow/pipelines/issues/9813#issuecomment-2017138982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXRWTPO2DUQDZHVE6JBLTCLYZ6JZVAVCNFSM6AAAAAA3DJ3CPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXGEZTQOJYGI . You are receiving this because you commented.Message ID: @.***>

Irshu786 avatar Mar 26 '24 12:03 Irshu786

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 26 '24 07:05 github-actions[bot]

/lifecycle frozen

rimolive avatar May 27 '24 19:05 rimolive

@zijianjoy anyone working on this issue ?

sagnik3788 avatar Jun 26 '24 15:06 sagnik3788

@sagnik3788 This is part of the Google Summer of Code. You can find details in https://www.kubeflow.org/events/gsoc-2024/#project-9-postgresql-integration-in-kubeflow-pipelines.

rimolive avatar Jun 26 '24 16:06 rimolive