starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

Support Python UDF

Open stdpain opened this issue 1 year ago • 4 comments

Feature request

support Python UDF for starrocks

grammar: extends the create function to support inline call

CREATE FUNCTION echo(int) 
RETURNS int 
properties(     
"symbol" = "add", 
"type" = "Python",     
"file" = "inline" ,
"input" = "arrow"
)  
AS
$$
def echo(a):
    return a;
$$;

UDFCall: use grpc to call Python UDF Be process

Python Env Support: Since 3.8+ Arrow 16.0.0+ grpc

pip install pyarrow
pip install grpcio

make sure all package are install in PYTHONHOME (config in BE)

fe config: enable_udf=true be config:

python_envs=/root/python38

examples:

see test/sql/test_udf/test_python_udf

Additional context

PR Links

  • [x] https://github.com/StarRocks/starrocks/pull/46247
  • [x] https://github.com/StarRocks/starrocks/pull/46299
  • [x] https://github.com/StarRocks/starrocks/pull/46449

stdpain avatar May 24 '24 14:05 stdpain

linked to this issue

Python UDF support · Issue #45843 · StarRocks/starrocks https://github.com/StarRocks/starrocks/issues/45843

dirtysalt avatar May 24 '24 18:05 dirtysalt

@stdpain Hi, recently, I have been researching how to implement Python UDAF and UDTF. Can we add a WeChat conversation? Also, do you have any specific design documentation for your Python UDF? My WeChat is Liuwenclever.😆

WencongLiu avatar Sep 13 '24 03:09 WencongLiu

@stdpain Hi, recently, I have been researching how to implement Python UDAF and UDTF. Can we add a WeChat conversation? Also, do you have any specific design documentation for your Python UDF? My WeChat is Liuwenclever.😆

We could open a public discussion in slack. https://starrocks.slack.com/archives/C02FAD0JSSD

stdpain avatar Sep 27 '24 08:09 stdpain

@stdpain Hi, I have heard the python UDF will be supported in version 3.4, so when is the expected time? Thanks a lot.

zhangm365 avatar Oct 18 '24 05:10 zhangm365

@zhangm365 Python UDF is released in v3.4, you can have try. But, it's only about UDF, does not include UDAF or UDTF.

jaogoy avatar Mar 19 '25 02:03 jaogoy

@jaogoy Do you have more precise python UDF installation instructions??

As I understand, beside this instructions I should do this

  1. Python should be in a virtual env? SR is searching for this bin/python3
  2. Is it obligatory to install pyarrow grpcio?
  3. I only set variable in a BE config python_envs=/opt/python310/

rkinwork avatar Mar 24 '25 20:03 rkinwork

How we installed pyton UDF

  1. Enabled in a FE configuration file enable_udf=true
  2. Installed Python to BEs
  3. Installed pyarrow, grpcio, pandas - but I think first two are crucial for making work UDFs without errors
  4. Set in a BEs configuration file python_envs = /usr/ (actual interpretator is located here /usr/bin/python3 )

We tried to set python_envs = /opt/python310/ - it is a directory where we have virtual env /opt/python310/bin/python3 but got this error

[42000][1064] worker start failed:Python path configuration:
PYTHONHOME = '/opt/python310/'
PYTHONPATH = (not set)
program name = 'python3'
isolated = 0
environment = 1
user site = 1
import site = 1
sys._base_executable = : BE:10005

rkinwork avatar Mar 26 '25 11:03 rkinwork

Python path configuration: PYTHONHOME = '/opt/python310/' PYTHONPATH = (not set) program name = 'python3' isolated = 0 environment = 1 user site = 1 import site = 1 sys._base_executable =

I did not try venv, this I will test later. I have tested the system default path Ubuntu 22.04 python_envs="/” and compiled and installed Centos7 python_envs="/opt/python-3.x” Next I will list how to compile and install python.

stdpain avatar Mar 26 '25 11:03 stdpain

wget 'https://github.com/openssl/openssl/archive/OpenSSL_1_1_1m.tar.gz'
tar -zxf openssl-OpenSSL_1_1_1m.tar.gz 
cd OpenSSL_1_1_1m
export OPENSSL_DIR=`pwd`/install
./Configure --prefix=`pwd`/install
./config --prefix=`pwd`/install
make -j 16 && make install 

export LD_LIBRARY_PATH=$OPENSSL_DIR/lib:$LD_LIBRARY_PATH

wget 'https://www.python.org/ftp/python/3.12.9/Python-3.12.9.tgz'
tar -zxf ./Python-3.12.9.tgz 
cd Python-3.12.9
mkdir build && cd build
../configure --prefix=`pwd`/install --with-openssl=$OPENSSL_DIR
make -j 16 && make install
./install/bin/pip3 install pyarrow grpcio
tar -zcf ./Python-3.12.9.tar.gz install

then put the ./Python-3.12.9.tar.gz to your target machine

tar -zxf ./Python-3.12.9.tar.gz

and edit be/conf/be.conf python_envs=/home/disk1/sr/install

stdpain avatar Mar 27 '25 02:03 stdpain