flink
flink copied to clipboard
[FLINK-34111][table] Add JSON_QUOTE and JSON_UNQUOTE function
What is the purpose of the change
This pull request adds JSON_QUOTE and JSON_UNQUOTE functions
Brief change log
- Add the necessary documentation and functions to support JSON_QUOTE and JSON_UNQUOTE functions
Verifying this change
This change added tests and can be verified via org.apache.flink.table.planner.expressions.ScalarFunctionsTests.testJsonQuote
and org.apache.flink.table.planner.expressions.ScalarFunctionsTests.testJsonUnquote
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? docs
CI report:
- 416cf3e64db044bcde3a825f66bfe6d6b3288689 Azure: FAILURE
Bot commands
The @flinkbot bot supports the following commands:-
@flinkbot run azure
re-run the last Azure build
Hi @MartijnVisser could you please review the PR in your available time? Thanks!
@flinkbot run azure
Thanks @snuyanzin for the review. I will check the tests and add more [failing] tests.About the implementation doesn't follow new stack implementation, I havent noticed that my implementation was using deprecated interfaces/stack. So, there is no specific reason to follow this implementation stack.
Yes, it's not marked as deprecated however for new functions it's better to follow new stack which is a result of FLIP-32[1] and FLIP-65[2].
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-65%3A+New+type+inference+for+Table+API+UDFs
Thanks for moving to a new stack
I feel like a lack of IT tests, I would suggest to follow similar approach for tests
Here it is a class for JSON function relates IT tests org.apache.flink.table.planner.functions.JsonFunctionsITCase
The reason I asked about that is that I fetched curent branch locally, build it, I can confirm that existing tests are passing, however from user point of JSON_UNQOTE
does not seem to be working: if you run it e.g. under flink_sql_client
and execute any query e.g. SELECT JSON_UNQUOTE('"abc"');
it returns empty string.
Another issue is that current tests are checking only sql (as mentioned not at IT level) and do not check table api
Thanks for moving to a new stack
I feel like a lack of IT tests, I would suggest to follow similar approach for tests Here it is a class for JSON function relates IT tests
org.apache.flink.table.planner.functions.JsonFunctionsITCase
The reason I asked about that is that I fetched curent branch locally, build it, I can confirm that existing tests are passing, however from user point ofJSON_UNQOTE
does not seem to be working: if you run it e.g. underflink_sql_client
and execute any query e.g.SELECT JSON_UNQUOTE('"abc"');
it returns empty string.Another issue is that current tests are checking only sql (as mentioned not at IT level) and do not check table api
Thanks for the review. I added IT cases.
SELECT JSON_UNQUOTE('"abc"');
it returns empty string.
Yes, it returns empty string (now I changed it, based on MySQL/MariaDB to return the same value) because '"abc"' is not a valid json string (according to the definition). For example, '["abc"]' is a valid json string.
@snuyanzin Do you want to take another pass?
Thanks for the comment @snuyanzin . I think we need more investigation about the clear definition and behavior of these functions, since the ones adopted from MySQL are a bit ambiguous.
Based on current description if I understand it correctly (please correct me if I'm wrong): I assume that if my input is correct and quoted then I would also assume that after applying of
JSON_UNQUOTE
I can receive the initial stringHowever
SELECT json_unquote(json_quote('"key": "value"'));
returns
"\"key\": \"value\""
I think this should be fixed to be able to get again'"key": "value"'
? Or did I miss anything?
Is it mainly because of the definition of JSON_QUOTE
that does not require its input to be valid JSON, and JSON_UNQOTE
that require its input to be a valid JSON.
As I mentioned, we need a clearer definition (maybe not from MySQL references)
Is it mainly because of the definition of JSON_QUOTE that does not require its input to be valid JSON, and JSON_UNQOTE that require its input to be a valid JSON.
I'm not sure that this is the only reason
there is a more simple case, for instance "key"
is a valid JSON and json_quote
produces also a valid json "\"key\""
however the combination also seems not working
SELECT json_unquote(json_quote('"key"'));
and continues producing "\"key\""
Is it mainly because of the definition of JSON_QUOTE that does not require its input to be valid JSON, and JSON_UNQOTE that require its input to be a valid JSON.
I'm not sure that this is the only reason
there is a more simple case, for instance
"key"
is a valid JSON andjson_quote
produces also a valid json"\"key\""
however the combination also seems not workingSELECT json_unquote(json_quote('"key"'));
and continues producing
"\"key\""
Let me paste the definitions here:
json_quote
: Quotes a string as a JSON value by wrapping it with double quote characters, escaping interior quote and other characters, and returning the result as a utf8mb4 string. If the argument is NULL, the function returns NULL.
json_unquote
: Unquotes JSON value and returns the result as a utf8mb4 string. If the argument is NULL, returns NULL. If the value starts and ends with double quotes but is not a valid JSON string literal, the value is passed through unmodified.
So, from this definition, json_quote
does not enforce the input to be a valid json, but json_unquote
does.
So, in your example, json_quote('"key"')
produces "\"key\""
and json_unqote("\"key\"")
will produce the same ("\"key\"")
since the input ("\"key\"")
is not a valid json.
Instead this works: testSqlApi("JSON_UNQUOTE(JSON_QUOTE('[1,2,3]'))", "[1,2,3]")
since the input of the json_unqote is a valid json.
These rules I derived from the definition I pasted above. Maybe I am missing something.
and json_unqote("\"key\"") will produce the same ("\"key\"") since the input ("\"key\"") is not a valid json.
can you clarify why it the input here is not a valid json?
Based on [1] valid json is an element
json element
which is a value separated by whitespaces
element ws value ws
and value itself could be a string
value object array string number "true" "false" "null"
which is characters
string '"' characters '"'
characters "" character characters and characters could contain escape characters
character '0020' . '10FFFF' - '"' - '' '' escape
like that
escape '"' '' '/' 'b' 'f' 'n' 'r' 't' 'u' hex hex hex hex
so I didn't get why "\"key\""
should be considered as an invalid json...
Can you elaborate here please?
By the way I checked these queries against MySQL and it works as expected [2]
select json_unquote(json_quote("\"key\""));
select json_unquote(json_quote("\"key\": \"value\""));
it returns
"key"
"key": "value"
[1] https://www.json.org/json-en.html [2] https://www.db-fiddle.com/f/fuuT4xzfFEbY772eNkLkmp/0
Thanks a lot for the comments. I agree. I was a bit influenced by the definition of MySQL and didn't check the json.org
.
Hi @jeyhunkarimov thank you for contributing this PR!. Just wanted to check if you were planning on working further on this, otherwise I'd be happy to continue and address some of the feedback. Please let me know 👍
Hi @anupamaggarwal thanks for your proposal. Please feel free to continue!
thanks @jeyhunkarimov! linking the PR https://github.com/apache/flink/pull/24967
superseded by https://github.com/apache/flink/pull/24967