flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-34111][table] Add JSON_QUOTE and JSON_UNQUOTE function

Open jeyhunkarimov opened this issue 1 year ago • 16 comments

What is the purpose of the change

This pull request adds JSON_QUOTE and JSON_UNQUOTE functions

Brief change log

  • Add the necessary documentation and functions to support JSON_QUOTE and JSON_UNQUOTE functions

Verifying this change

This change added tests and can be verified via org.apache.flink.table.planner.expressions.ScalarFunctionsTests.testJsonQuote and org.apache.flink.table.planner.expressions.ScalarFunctionsTests.testJsonUnquote

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

jeyhunkarimov avatar Jan 19 '24 23:01 jeyhunkarimov

CI report:

  • 416cf3e64db044bcde3a825f66bfe6d6b3288689 Azure: FAILURE
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 19 '24 23:01 flinkbot

Hi @MartijnVisser could you please review the PR in your available time? Thanks!

jeyhunkarimov avatar Jan 22 '24 21:01 jeyhunkarimov

@flinkbot run azure

JingGe avatar Feb 01 '24 19:02 JingGe

Thanks @snuyanzin for the review. I will check the tests and add more [failing] tests.About the implementation doesn't follow new stack implementation, I havent noticed that my implementation was using deprecated interfaces/stack. So, there is no specific reason to follow this implementation stack.

jeyhunkarimov avatar Feb 04 '24 19:02 jeyhunkarimov

Yes, it's not marked as deprecated however for new functions it's better to follow new stack which is a result of FLIP-32[1] and FLIP-65[2].

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-65%3A+New+type+inference+for+Table+API+UDFs

snuyanzin avatar Feb 04 '24 21:02 snuyanzin

Thanks for moving to a new stack

I feel like a lack of IT tests, I would suggest to follow similar approach for tests Here it is a class for JSON function relates IT tests org.apache.flink.table.planner.functions.JsonFunctionsITCase The reason I asked about that is that I fetched curent branch locally, build it, I can confirm that existing tests are passing, however from user point of JSON_UNQOTE does not seem to be working: if you run it e.g. under flink_sql_client and execute any query e.g. SELECT JSON_UNQUOTE('"abc"'); it returns empty string.

Another issue is that current tests are checking only sql (as mentioned not at IT level) and do not check table api

snuyanzin avatar Feb 05 '24 19:02 snuyanzin

Thanks for moving to a new stack

I feel like a lack of IT tests, I would suggest to follow similar approach for tests Here it is a class for JSON function relates IT tests org.apache.flink.table.planner.functions.JsonFunctionsITCase The reason I asked about that is that I fetched curent branch locally, build it, I can confirm that existing tests are passing, however from user point of JSON_UNQOTE does not seem to be working: if you run it e.g. under flink_sql_client and execute any query e.g. SELECT JSON_UNQUOTE('"abc"'); it returns empty string.

Another issue is that current tests are checking only sql (as mentioned not at IT level) and do not check table api

Thanks for the review. I added IT cases.

SELECT JSON_UNQUOTE('"abc"'); it returns empty string.

Yes, it returns empty string (now I changed it, based on MySQL/MariaDB to return the same value) because '"abc"' is not a valid json string (according to the definition). For example, '["abc"]' is a valid json string.

jeyhunkarimov avatar Feb 05 '24 21:02 jeyhunkarimov

@snuyanzin Do you want to take another pass?

MartijnVisser avatar Feb 12 '24 10:02 MartijnVisser

Thanks for the comment @snuyanzin . I think we need more investigation about the clear definition and behavior of these functions, since the ones adopted from MySQL are a bit ambiguous.

Based on current description if I understand it correctly (please correct me if I'm wrong): I assume that if my input is correct and quoted then I would also assume that after applying of JSON_UNQUOTE I can receive the initial string

However

SELECT json_unquote(json_quote('"key": "value"'));

returns "\"key\": \"value\"" I think this should be fixed to be able to get again '"key": "value"' ? Or did I miss anything?

Is it mainly because of the definition of JSON_QUOTE that does not require its input to be valid JSON, and JSON_UNQOTE that require its input to be a valid JSON.

As I mentioned, we need a clearer definition (maybe not from MySQL references)

jeyhunkarimov avatar Feb 12 '24 13:02 jeyhunkarimov

Is it mainly because of the definition of JSON_QUOTE that does not require its input to be valid JSON, and JSON_UNQOTE that require its input to be a valid JSON.

I'm not sure that this is the only reason

there is a more simple case, for instance "key" is a valid JSON and json_quote produces also a valid json "\"key\"" however the combination also seems not working

SELECT json_unquote(json_quote('"key"'));

and continues producing "\"key\""

snuyanzin avatar Feb 12 '24 13:02 snuyanzin

Is it mainly because of the definition of JSON_QUOTE that does not require its input to be valid JSON, and JSON_UNQOTE that require its input to be a valid JSON.

I'm not sure that this is the only reason

there is a more simple case, for instance "key" is a valid JSON and json_quote produces also a valid json "\"key\"" however the combination also seems not working

SELECT json_unquote(json_quote('"key"'));

and continues producing "\"key\""

Let me paste the definitions here:

json_quote: Quotes a string as a JSON value by wrapping it with double quote characters, escaping interior quote and other characters, and returning the result as a utf8mb4 string. If the argument is NULL, the function returns NULL.

json_unquote: Unquotes JSON value and returns the result as a utf8mb4 string. If the argument is NULL, returns NULL. If the value starts and ends with double quotes but is not a valid JSON string literal, the value is passed through unmodified.

So, from this definition, json_quote does not enforce the input to be a valid json, but json_unquote does.

So, in your example, json_quote('"key"') produces "\"key\"" and json_unqote("\"key\"") will produce the same ("\"key\"") since the input ("\"key\"") is not a valid json.

Instead this works: testSqlApi("JSON_UNQUOTE(JSON_QUOTE('[1,2,3]'))", "[1,2,3]") since the input of the json_unqote is a valid json.

These rules I derived from the definition I pasted above. Maybe I am missing something.

jeyhunkarimov avatar Feb 12 '24 13:02 jeyhunkarimov

and json_unqote("\"key\"") will produce the same ("\"key\"") since the input ("\"key\"") is not a valid json.

can you clarify why it the input here is not a valid json?

Based on [1] valid json is an element

json element

which is a value separated by whitespaces

element ws value ws

and value itself could be a string

value object array string number "true" "false" "null"

which is characters

string '"' characters '"'

characters "" character characters and characters could contain escape characters

character '0020' . '10FFFF' - '"' - '' '' escape

like that

escape '"' '' '/' 'b' 'f' 'n' 'r' 't' 'u' hex hex hex hex

so I didn't get why "\"key\"" should be considered as an invalid json... Can you elaborate here please?

By the way I checked these queries against MySQL and it works as expected [2]

select json_unquote(json_quote("\"key\""));
select json_unquote(json_quote("\"key\": \"value\""));

it returns

"key"
"key": "value"

[1] https://www.json.org/json-en.html [2] https://www.db-fiddle.com/f/fuuT4xzfFEbY772eNkLkmp/0

snuyanzin avatar Feb 12 '24 14:02 snuyanzin

Thanks a lot for the comments. I agree. I was a bit influenced by the definition of MySQL and didn't check the json.org.

jeyhunkarimov avatar Feb 12 '24 20:02 jeyhunkarimov

Hi @jeyhunkarimov thank you for contributing this PR!. Just wanted to check if you were planning on working further on this, otherwise I'd be happy to continue and address some of the feedback. Please let me know 👍

anupamaggarwal avatar Jun 12 '24 11:06 anupamaggarwal

Hi @anupamaggarwal thanks for your proposal. Please feel free to continue!

jeyhunkarimov avatar Jun 16 '24 12:06 jeyhunkarimov

thanks @jeyhunkarimov! linking the PR https://github.com/apache/flink/pull/24967

anupamaggarwal avatar Jun 20 '24 12:06 anupamaggarwal

superseded by https://github.com/apache/flink/pull/24967

snuyanzin avatar Aug 11 '24 22:08 snuyanzin