gpdb
gpdb copied to clipboard
[7X] Support building standalone management utils
Previously, it was required to set PYTHONPATH before using the GP management utils. Since PYTHONPATH will affect where to lookup Python packages, having this dependency makes it inconvenient for the user to do other things, such as data science, in Python on the same machine.
This patch adds support for building the GP management utils into standalone executables with cx_freeze so that dependency on PYTHONPATH can be removed.
Note that this patch does not affect what and how we release. The feature is completely optional and can be configured with --enable-standalone-utils. This patch is expected to not change any bahavior of any util.
Known limitations and workarounds:
- cx_freeze does not create links for the included files. Thus, we need to copy them to $GPHOME/bin or $GPHOME/sbin in Makefile
- after the utils are built into standalone binaries,
__file__will no longer be valid. Thus, we need to replace them with appropriate script names or paths. - gppylib uses
python -cto call Python functions directly during remote execution. Since PYTHONPATH is not set, we need to add the path manually before calling the functions.
Here are some reminders before you submit the pull request
- [ ] Add tests for the change
- [ ] Document changes
- [ ] Communicate in the mailing list if needed
- [x] Pass
make installcheck - [ ] Review a PR in return to support the community
Status: flying a full pipeline to make sure that no behavior of any GP management util is changed.
Thanks @xuebinsu , I will take a look later.
Hi @xuebinsu Could you give a simple tutorial of how to using gp utils with your PR?
I do think adding --enable-standalone-utils is not needed if the solution is acceptable :)
@adam8157 just a reminder, pls take a look when you have time.
@adam8157 just a reminder, pls take a look when you have time.
Sorry, I was busy and missed this.
This PR is exciting and helpful, at-ing the maintainers to make the decision.
Can you create a CLI pipeline with --enable-standalone-utils set and unset both ways for centos, rhel8 and ubuntu platform to make sure its not breaking anything?
Status: CI pipelines have been setup, but failed. Looking into the reasons.
- without
--enable-standalone-utils: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev2-xuebinsu - with
--enable-standalone-utils: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev2-xuebinsu-enable-standalone
Thanks very much for your feedbacks!
Since it is good to keep the doc as close to the code as possible, I will first write a doc in setup.py. Will add a dedicated and more formal doc if we think that is necessary.
A minimum guide has been added to setup.py:
Steps to build and install GPDB with standalone utils:
cdto the root directory of the GPDB source code.pip3 install --user -r python-developer-dependencies.txt./configure --enable-standalone-utils <other_options_you_want>make installAfter installation completes, you can use the standalone utils in exactly the same way as before.
Hi @shrakesh , thanks for you feedback. A description of the thought process has been added.
The thought process:
- Find a tool that can package all the dependencies of a Python script (cx_Freeze in this case),
- Help it find all the dependencies of each our management util by setting
sys.path, and- Modify config and Makefiles accordingly to use the packaging tool.
Hi @piyushc01 , Thanks for you feedback! A description of setup.py as been added:
setup.py This file specifies how a Python package is built and installed. This is required by setuptools and its descendants including cx_Freeze.
@xuebinsu @beeender FYI
for python packageing
there are four main packing tool
- py2exe
- pyInstaller
- cx_Freeze
- Nuitka
different
| Solution | Windows | Linux | OS X | License | One-file | Stars | Maintainer |
|---|---|---|---|---|---|---|---|
| py2exe | yes | no | no | MIT | yes | 488 | Org |
| pyInstaller | yes | yes | yes | GPL | yes | 9.7k | Org |
| cx_Freeze | yes | yes | yes | PSF | no | 985 | Personal |
| Nuitka | yes | yes | yes | Apache-2.0 | yes | 7.7k | Business |
for our gpdb repo
cx_Freeze
good:
- License
- easy to build
- no need to onefile job
- no need to change many codes
bad:
- have met some bug
pyInstaller
good:
- most widely used can easily search bugs
- active community
bad:
- not so friendly License
- need to change some code
- the package build is bigger than cx_Freeze
Nuitka
good:
- fast
bad:
- Need to charge
Above all cx_Freeze is the best solution for now if we want to build the package
use the command like gpstart, gpssh and not need to change PYTHONPATH
cx_Freeze problem we met,
centos7 python3.6 there's one bug using whl install
we fixed it by using setup.py install