doris
doris copied to clipboard
(Fix)[hive-writer] Fixed the issue when partition values contain spaces when writing to s3.
Proposed changes
Issue Number: close #31442
(Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3.
Error msg
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
Root Cause
Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI.
Solution
The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.