ksql icon indicating copy to clipboard operation
ksql copied to clipboard

Add Inline Java UDFs

Open mitch-seymour opened this issue 6 years ago • 11 comments

Description

Add support for inline Java UDFs. This evolved from a discussion held in KLIP-2 (pending)

TODO:
  • [x] Fix minor issues in standalone mode
  • [x] ~Improve KSQL grammar updates (i.e. so keywords like begin or end don't clash with the inline UDF)~ I ended up using a more unique script delimiter: $$
  • [x] More tests
  • [x] More docs
  • [x] Merge conflicts

Testing done

Unit tests, manual testing. In progress...

# tab 1
$ ./bin/ksql-server-start config/ksql-server.properties

# tab 2
$ ./bin/ksql

Now, in the CLI, try creating a couple of inline UDFs.

/** Greet function **/
ksql>  
    CREATE OR REPLACE FUNCTION GREET(name STRING)
    RETURNS STRING
    LANGUAGE JAVA
    WITH(author='Jane', description='Create a greeting', version='0.1.0')
    AS $$
        return "Hello " + NAME;
    $$;

 Message
------------------
 Function created
------------------

ksql> DESCRIBE FUNCTION GREET ;

Name        : GREET
Author      : 'Jane'
Version     : '0.1.0'
Type        : scalar
Jar         : internal
Variations  :

	Variation   : GREET(VARCHAR)
	Returns     : VARCHAR
	Description : 'Create a greeting'

ksql> DROP FUNCTION GREET ;

 Message
----------------------------
 Function GREET was dropped
---------------------------

ksql> DESCRIBE FUNCTION GREET ;
Can't find any functions with the name 'GREET'
/** More complicated zip function **/
ksql> 
    CREATE OR REPLACE FUNCTION ZIP(a VARCHAR, b VARCHAR)
    RETURNS MAP<VARCHAR, VARCHAR>
    LANGUAGE JAVA
    WITH(author='Bob', description='Create a map from two objects', version='0.1.0')
    AS $$
        import java.util.HashMap;
        import java.util.Map;
        Map<String,String> m =  new HashMap<String,String>();
        m.put(A, B);
        return m;
    $$ ;

Once an inline Java UDF is created, it can be used in a query just like any other UDF. Also, non-inline UDFs cannot be dropped :)

ksql> DROP FUNCTION LCASE ;
Cannot drop function: LCASE is not an inline function

Anyways, feel free to take this for a spin!

Reviewer checklist

  • [ ] Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • [ ] Ensure relevant issues are linked (description should include text like "Fixes #")

mitch-seymour avatar Mar 26 '19 03:03 mitch-seymour

@confluentinc It looks like @mitch-seymour just signed our Contributor License Agreement. :+1:

Always at your service,

clabot

ghost avatar Mar 26 '19 03:03 ghost

Thanks @mitch-seymour, I had a quick pass over the PR and the main issue at the moment is that we need to update the grammar in a way that the UDF_SCRIPT can handle any java code including when we have KSQL keywords in the code. Consider the following example which is the same as you have above but I changed the variable name from m to end. This code won't work:

/** More complicated zip function **/
ksql> CREATE OR REPLACE FUNCTION zip(a VARCHAR, b VARCHAR)
RETURNS MAP<VARCHAR, VARCHAR>
LANGUAGE JAVA
AS BEGIN
    import java.util.HashMap; \
    import java.util.Map; \
    Map<String,String> end =  new HashMap<String,String>();\
    end.put(args[0], args[1]);\
    return end;\
END ;

hjafarpour avatar Mar 26 '19 04:03 hjafarpour

@hjafarpour that's a good point. I'll revisit the grammar updates and see if I can make it more flexible :)

@JimGalasyn thanks for the suggestions regarding the new docs. I committed those changes :)

mitch-seymour avatar Mar 26 '19 16:03 mitch-seymour

@hjafarpour I spent some time on the grammar updates but had a hard time implementing a version where we could ignore the BEGIN or END KSQL keywords in the inline script body. I got it working in interactive mode, oddly enough, but an issue came up during headless mode where multiple queries are read at once. In this case, the parser didn't understand where a CREATE FUNCTION query stopped and another began.

Anyways, I captured some of that work in this commit if you're interested, but ended up using a script delimiter that is more unique: $$. Please see my updated examples above and the updated syntax reference.

Finally, I'm finding my current workflow for iterating through the new unit tests a little tedious :/ Is there a workflow that doesn't require me to do a multi-project build while iterating on these tests?

mitch-seymour avatar Mar 29 '19 02:03 mitch-seymour

@agavra @JimGalasyn @hjafarpour this PR is finally ready for review :) All of the tests are passing, though Jenkins is flagging a redundantmodifier warning that seems like a minor issue. Anyways, sorry this PR is so big. I had no idea how much work was involved or how much time it would take, but I'm pretty happy with where's it at and am ready for feedback.

Notes

  • I ended up not creating a separate KLIP since this evolved out of KLIP-2, and it sounds like it may not have been necessary (even though I initially intended to do so, but have been somewhat strapped for time lately). I can submit a KLIP if it's necessary though.

  • I ended up using special script delimiters, $$, since I found it challenging to use BEGIN and END blocks while also ignoring nested code that included those characters as well. Ignoring nested BEGIN / END characters wouldn't have been too challenging, except for the fact that we also need to ignore nested ;, which is used to end statements in Java code and KSQL expressions. It presented an interesting challenge of trying to ignore two terminating characters END and ;, and the side-effect was KSQL didn't know where one query stopped and another began.

  • Since I ended up using $$ to resolve this issue (users are much less likely to introduce code that prematurely terminates the statement with $$ as the delimiter), I also favored Postgres' CREATE OR REPLACE syntax over CREATE OR ALTER, since Postgres also supports $$.

Anyways, hope this makes sense. I'm looking forward to getting your feedback and hopefully getting this integrated into the codebase if it proves to be useful.

mitch-seymour avatar Apr 09 '19 20:04 mitch-seymour

I will do a first pass on this tomorrow or Friday 😄

agavra avatar Apr 11 '19 00:04 agavra

Sounds good @agavra, thanks for taking a look :)

mitch-seymour avatar Apr 11 '19 11:04 mitch-seymour

@agavra thanks so much for the feedback! I'll start working my way through the suggestions in the next few days (I'm a bit slammed this week unfortunately). also, I'd love to jump on a call at some point. Maybe sometime next week (or anytime after the 22nd)? thanks again, I know reviewing 1500 lines of code isn't super fun lol.

mitch-seymour avatar Apr 15 '19 18:04 mitch-seymour

@mitch-seymour I had totally lost track of this pr - I apologize for that! I see that you're going to be at Kafka Summit London. Do you want to meet up there in person and we can resolve any open issues?

agavra avatar May 09 '19 20:05 agavra

@agavra no worries, I've been busy preparing for Kafka Summit so haven't been able to work on this further. but sure, let's meet in person at Kafka Summit. I took a wild guess at your work email address and it didn't bounce back, so we can coordinate a time to meet up via email if that works for you :)

mitch-seymour avatar May 10 '19 01:05 mitch-seymour

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

:x: mitch-seymour
:x: JimGalasyn
You have signed the CLA already but the status is still pending? Let us recheck it.

cla-assistant[bot] avatar Nov 15 '23 20:11 cla-assistant[bot]

Closing stale PR. I'm sorry that we didn't get this done. Please resubmit as a new PR if you still want to get it in.

vvcephei avatar Feb 26 '24 16:02 vvcephei