Support Python UDF
Feature request
support Python UDF for starrocks
grammar: extends the create function to support inline call
CREATE FUNCTION echo(int)
RETURNS int
properties(
"symbol" = "add",
"type" = "Python",
"file" = "inline" ,
"input" = "arrow"
)
AS
$$
def echo(a):
return a;
$$;
UDFCall: use grpc to call Python UDF Be process
Python Env Support: Since 3.8+ Arrow 16.0.0+ grpc
pip install pyarrow
pip install grpcio
make sure all package are install in PYTHONHOME (config in BE)
fe config: enable_udf=true be config:
python_envs=/root/python38
examples:
see test/sql/test_udf/test_python_udf
Additional context
PR Links
- [x] https://github.com/StarRocks/starrocks/pull/46247
- [x] https://github.com/StarRocks/starrocks/pull/46299
- [x] https://github.com/StarRocks/starrocks/pull/46449
linked to this issue
Python UDF support · Issue #45843 · StarRocks/starrocks https://github.com/StarRocks/starrocks/issues/45843
@stdpain Hi, recently, I have been researching how to implement Python UDAF and UDTF. Can we add a WeChat conversation? Also, do you have any specific design documentation for your Python UDF? My WeChat is Liuwenclever.😆
@stdpain Hi, recently, I have been researching how to implement Python UDAF and UDTF. Can we add a WeChat conversation? Also, do you have any specific design documentation for your Python UDF? My WeChat is Liuwenclever.😆
We could open a public discussion in slack. https://starrocks.slack.com/archives/C02FAD0JSSD
@stdpain Hi, I have heard the python UDF will be supported in version 3.4, so when is the expected time? Thanks a lot.
@zhangm365 Python UDF is released in v3.4, you can have try. But, it's only about UDF, does not include UDAF or UDTF.
@jaogoy Do you have more precise python UDF installation instructions??
As I understand, beside this instructions I should do this
- Python should be in a virtual env? SR is searching for this
bin/python3 - Is it obligatory to install pyarrow grpcio?
- I only set variable in a BE config
python_envs=/opt/python310/
How we installed pyton UDF
- Enabled in a FE configuration file
enable_udf=true - Installed Python to BEs
- Installed
pyarrow,grpcio,pandas- but I think first two are crucial for making work UDFs without errors - Set in a BEs configuration file
python_envs = /usr/(actual interpretator is located here/usr/bin/python3)
We tried to set python_envs = /opt/python310/ - it is a directory where we have virtual env /opt/python310/bin/python3 but got this error
[42000][1064] worker start failed:Python path configuration:
PYTHONHOME = '/opt/python310/'
PYTHONPATH = (not set)
program name = 'python3'
isolated = 0
environment = 1
user site = 1
import site = 1
sys._base_executable = : BE:10005
Python path configuration: PYTHONHOME = '/opt/python310/' PYTHONPATH = (not set) program name = 'python3' isolated = 0 environment = 1 user site = 1 import site = 1 sys._base_executable =
I did not try venv, this I will test later. I have tested the system default path Ubuntu 22.04 python_envs="/” and compiled and installed Centos7 python_envs="/opt/python-3.x” Next I will list how to compile and install python.
wget 'https://github.com/openssl/openssl/archive/OpenSSL_1_1_1m.tar.gz'
tar -zxf openssl-OpenSSL_1_1_1m.tar.gz
cd OpenSSL_1_1_1m
export OPENSSL_DIR=`pwd`/install
./Configure --prefix=`pwd`/install
./config --prefix=`pwd`/install
make -j 16 && make install
export LD_LIBRARY_PATH=$OPENSSL_DIR/lib:$LD_LIBRARY_PATH
wget 'https://www.python.org/ftp/python/3.12.9/Python-3.12.9.tgz'
tar -zxf ./Python-3.12.9.tgz
cd Python-3.12.9
mkdir build && cd build
../configure --prefix=`pwd`/install --with-openssl=$OPENSSL_DIR
make -j 16 && make install
./install/bin/pip3 install pyarrow grpcio
tar -zcf ./Python-3.12.9.tar.gz install
then put the ./Python-3.12.9.tar.gz to your target machine
tar -zxf ./Python-3.12.9.tar.gz
and edit be/conf/be.conf python_envs=/home/disk1/sr/install