[ZEPPELIN-6095] add jdbc interpreter url validation check condition
What is this PR for?
Add some validation check conditions to existing url validator in jdbc interpreter. So now it can check URLs with the conditions below if it has an unallowable configuration.
- UTF-8 encoded
What type of PR is it?
Improvement
Todos
- [ ] - Task
What is the Jira issue?
How should this be tested?
Input the url with unallowable configurations in UTF-8 encoded in JDBC type interpreter. Then run the command in notebook and see if the command is blocked from running.
Screenshots (if appropriate)
Questions:
- Does the license files need to update? No
- Is there breaking changes for older versions? No
- Does this needs documentation? No
do we really need to remove white spaces here?
I have the same question too. Is there any potential risk if we have encoded while spaces in the JDBC URL?
In Apache Kyuubi cases, it allows the user to pass the spark.app.name as part of JDBC URL, so the encoded URL might be like
jdbc:kyuubi://host:port/db;#spark.app.name=Hello%20World;spark.driver.memory=4g
For example,
$ bin/beeline -u 'jdbc:kyuubi://localhost:10009/#spark.app.name=Hello%20World'
0: jdbc:hive2://localhost:10009/#spark.app.na> set spark.app.name;
24/09/20 00:30:25 INFO ExecuteStatement:
Spark application name: Hello World
application ID: local-1726763393107
application web UI: http://192.168.8.150:53761
master: local[*]
deploy mode: client
version: 3.3.1
Start time: 2024-09-20T00:29:52.154
User: anonymous
...
2024-09-20 00:30:26.123 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[a8771241-5233-44fe-8c69-ad94a8f0217d]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.128 seconds
+-----------------+--------------+
| key | value |
+-----------------+--------------+
| spark.app.name | Hello World |
+-----------------+--------------+
1 row selected (0.401 seconds)
@tbonelee @pan3793 Thank you for mentioning the white space issue.
I put the eliminating white spaces logic to block the unallowable configurations with white space included such as :
allow%20LoadLocalInfile=true
The URL with eliminating white spaces is used only when validating, not when creating connection with the URL. So I think it does not cause risky situation like a URL containing white space is not connected.
However I confirmed that if user put url with white space contained, the configuration does not apply and JDBC throws an error. So I will remove the eliminating white spaces logic.
https://github.com/apache/zeppelin/pull/4882 (Java release to 11) is merged to master branch. Can @pan3793 review this pr, so I can merge this branch?
some thoughts:
- PR title and desc are misleading, the key change of this PR is, validating the decoded URL instead of the encoded URL
- if the decoded URL is an illegal UTF-8 sequence, the new code just skips validation, I'm not sure this is the correct behavior. from the security perspective, we should fail immediately.
- changed the PR title to show the key change.
- changed code to fail immediately if url decoding throws error. Previously, I thought if decoding step fails, validating logic has a fault. However since @pan3793 says above, fail immediately is more close behavior to secure, I changed it.
- use StandardCharsets in
URLDecoder.decodemethod's second input parameter as @Reamer says above
LGTM, I have no further concerns.
Thanks for @Reamer, I fix the URLDecoder.decode method to fit for java 11.
Additionally, in java 11 URLDecoder, it never throws UnsupportedEncodingException so I removed the relevant try-catch statement.
@Reamer I rebased my pr after the master branch, and applied your reviews (fix indent & remove unused import). Please check for it!
Merged into master/branch-0.12