gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

[7X] Support building standalone management utils

Open xuebinsu opened this issue 3 years ago • 3 comments
trafficstars

Previously, it was required to set PYTHONPATH before using the GP management utils. Since PYTHONPATH will affect where to lookup Python packages, having this dependency makes it inconvenient for the user to do other things, such as data science, in Python on the same machine.

This patch adds support for building the GP management utils into standalone executables with cx_freeze so that dependency on PYTHONPATH can be removed.

Note that this patch does not affect what and how we release. The feature is completely optional and can be configured with --enable-standalone-utils. This patch is expected to not change any bahavior of any util.

Known limitations and workarounds:

  • cx_freeze does not create links for the included files. Thus, we need to copy them to $GPHOME/bin or $GPHOME/sbin in Makefile
  • after the utils are built into standalone binaries, __file__ will no longer be valid. Thus, we need to replace them with appropriate script names or paths.
  • gppylib uses python -c to call Python functions directly during remote execution. Since PYTHONPATH is not set, we need to add the path manually before calling the functions.

Here are some reminders before you submit the pull request

  • [ ] Add tests for the change
  • [ ] Document changes
  • [ ] Communicate in the mailing list if needed
  • [x] Pass make installcheck
  • [ ] Review a PR in return to support the community

xuebinsu avatar Sep 13 '22 09:09 xuebinsu

Status: flying a full pipeline to make sure that no behavior of any GP management util is changed.

xuebinsu avatar Sep 13 '22 09:09 xuebinsu

Thanks @xuebinsu , I will take a look later.

interma avatar Sep 15 '22 01:09 interma

Hi @xuebinsu Could you give a simple tutorial of how to using gp utils with your PR?

interma avatar Sep 19 '22 03:09 interma

I do think adding --enable-standalone-utils is not needed if the solution is acceptable :)

beeender avatar Oct 11 '22 09:10 beeender

@adam8157 just a reminder, pls take a look when you have time.

interma avatar Oct 12 '22 07:10 interma

@adam8157 just a reminder, pls take a look when you have time.

Sorry, I was busy and missed this.

This PR is exciting and helpful, at-ing the maintainers to make the decision.

adam8157 avatar Oct 17 '22 03:10 adam8157

Can you create a CLI pipeline with --enable-standalone-utils set and unset both ways for centos, rhel8 and ubuntu platform to make sure its not breaking anything?

piyushc01 avatar Oct 17 '22 08:10 piyushc01

Status: CI pipelines have been setup, but failed. Looking into the reasons.

  • without --enable-standalone-utils: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev2-xuebinsu
  • with --enable-standalone-utils: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev2-xuebinsu-enable-standalone

xuebinsu avatar Oct 19 '22 09:10 xuebinsu

Thanks very much for your feedbacks!

Since it is good to keep the doc as close to the code as possible, I will first write a doc in setup.py. Will add a dedicated and more formal doc if we think that is necessary.

xuebinsu avatar Oct 19 '22 09:10 xuebinsu

A minimum guide has been added to setup.py:

Steps to build and install GPDB with standalone utils:

  1. cd to the root directory of the GPDB source code.
  2. pip3 install --user -r python-developer-dependencies.txt
  3. ./configure --enable-standalone-utils <other_options_you_want>
  4. make install

After installation completes, you can use the standalone utils in exactly the same way as before.

xuebinsu avatar Oct 19 '22 11:10 xuebinsu

Hi @shrakesh , thanks for you feedback. A description of the thought process has been added.

The thought process:

  1. Find a tool that can package all the dependencies of a Python script (cx_Freeze in this case),
  2. Help it find all the dependencies of each our management util by setting sys.path, and
  3. Modify config and Makefiles accordingly to use the packaging tool.

xuebinsu avatar Oct 19 '22 11:10 xuebinsu

Hi @piyushc01 , Thanks for you feedback! A description of setup.py as been added:

setup.py This file specifies how a Python package is built and installed. This is required by setuptools and its descendants including cx_Freeze.

xuebinsu avatar Oct 20 '22 01:10 xuebinsu

@xuebinsu @beeender FYI

for python packageing

there are four main packing tool

  1. py2exe
  2. pyInstaller
  3. cx_Freeze
  4. Nuitka

different

Solution Windows Linux OS X License One-file Stars Maintainer
py2exe yes no no MIT yes 488 Org
pyInstaller yes yes yes GPL yes 9.7k Org
cx_Freeze yes yes yes PSF no 985 Personal
Nuitka yes yes yes Apache-2.0 yes 7.7k Business

for our gpdb repo

cx_Freeze

good:

  • License
  • easy to build
  • no need to onefile job
  • no need to change many codes

bad:

  • have met some bug

pyInstaller

good:

  • most widely used can easily search bugs
  • active community

bad:

  • not so friendly License
  • need to change some code
  • the package build is bigger than cx_Freeze

Nuitka

good:

  • fast

bad:

  • Need to charge

Above all cx_Freeze is the best solution for now if we want to build the package

use the command like gpstart, gpssh and not need to change PYTHONPATH

cx_Freeze problem we met, centos7 python3.6 there's one bug using whl install we fixed it by using setup.py install

yihong0618 avatar Nov 03 '22 09:11 yihong0618