Be able to invoke Python UDF directly in SQL query
What problem does the new feature solve?
GreptimeDB supports users to write post-process logic in Python (official document) like
@coprocessor(args=["host", "idc", "cpu_util", "memory_util", "disk_util"],
returns = ["host", "idc", "status"],
sql = "SELECT * FROM system_metrics")
def calc_status(hosts, idcs, cpus, memories, disks):
statuses = []
for host, cpu, memory, disk in zip(hosts, cpus, memories, disks):
if cpu > 80 or memory > 80 or disk > 80:
statuses.append("red")
continue
status = cpu * 0.4 + memory * 0.4 + disk * 0.2
return hosts, idcs, statuses
In this example script it runs on the result of SQL SELECT * FROM system_metrics.
And we can make it more amazing if the customized process logic can be integrated directly into the SQL itself, like
SELECT
host,
idc,
calc_status(cpu_util, memory_util, disk_util)
FROM
system_metrics;
What does the feature do?
Give the ability to embed pre-defined UDF scripts into SQL query
Implementation challenges
No response
We can make udf and script different objects in database.
Stored procedure in snowflake: https://docs.snowflake.com/en/sql-reference/stored-procedures.html UDFs in snowflake: https://docs.snowflake.com/en/sql-reference/user-defined-functions.html
We can make udf and script different objects in database.
Stored procedure in snowflake: https://docs.snowflake.com/en/sql-reference/stored-procedures.html UDFs in snowflake: https://docs.snowflake.com/en/sql-reference/user-defined-functions.html
:+1: This looks great!
CREATE OR REPLACE PROCEDURE MYPROC(from_table STRING, to_table STRING, count INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
AS
$$
def run(session, from_table, to_table, count):
session.table(from_table).limit(count).write.save_as_table(to_table)
return "SUCCESS"
$$;