flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-34108][table] Add built-in URL_ENCODE and URL_DECODE function.

Open superdiaodiao opened this issue 1 year ago • 17 comments

What is the purpose of the change

issue: https://issues.apache.org/jira/browse/FLINK-34108

This is an implementation of URL_ENCODE and URL_DECODE

  1. URL_ENCODE: Translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme(UTF-8).
  2. URL_DECODE: Decodes a string in 'application/x-www-form-urlencoded' format using a specific encoding scheme(UTF-8).

Brief change log

  1. URL_ENCODE
  • Syntax: url_encode(url)

  • Arguments: url: a string represents a URL

  • Returns: translates a string into 'application/x-www-form-urlencoded' format using a specific encoding scheme(UTF-8), will be null if the input is null or encode failed.

  • Examples:

url = 'https://flink.apache.org/'
SQL: url_encode(url)
TableAPI: url.urlEncode()

output: 'https%3A%2F%2Fflink.apache.org%2F'
  1. URL_DECODE
  • Syntax: url_decode(value)

  • Arguments: value: a URL encoded

  • Returns: decodes a string in 'application/x-www-form-urlencoded' format using a specific encoding scheme(UTF-8), will be null if the input is null or decode failed.

  • Examples:

value = 'https%3A%2F%2Fflink.apache.org%2F'
SQL: url_decode(value)
TableAPI: value.urlDecode()

output: 'https://flink.apache.org/'

Verifying this change

  • This change added tests in UrlFunctionITCase.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

superdiaodiao avatar May 12 '24 13:05 superdiaodiao

CI report:

  • b81d843d1bbecc2b07933968978653d725f0394f Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar May 12 '24 13:05 flinkbot

@flinkbot run azure

superdiaodiao avatar May 12 '24 14:05 superdiaodiao

Thanks for your contribution @superdiaodiao , I left some comments

also ci is failed and i tend to think it is related to your changes in python code, could you please have a look?

snuyanzin avatar May 13 '24 10:05 snuyanzin

Thanks for your contribution @superdiaodiao , I left some comments

also ci is failed and i tend to think it is related to your changes in python code, could you please have a look?

Thanks for your review, it helps a lot. I will continue to check the python part.

superdiaodiao avatar May 13 '24 10:05 superdiaodiao

@flinkbot run azure

superdiaodiao avatar May 13 '24 12:05 superdiaodiao

@snuyanzin @MartijnVisser @HuangXingBo PLZ take a look~

superdiaodiao avatar May 28 '24 12:05 superdiaodiao

@snuyanzin @MartijnVisser @HuangXingBo please take a look~~~

superdiaodiao avatar Jun 05 '24 07:06 superdiaodiao

Thanks for the contribution @superdiaodiao thanks for the review @HuangXingBo , @davidradl

it looks ok to me, I will test it a bit more and in case of succeed will merge it

snuyanzin avatar Jun 09 '24 22:06 snuyanzin

Thanks for the contribution @superdiaodiao thanks for the review @HuangXingBo , @davidradl

it looks ok to me, I will test it a bit more and in case of succeed will merge it

Thank you again!

superdiaodiao avatar Jun 10 '24 02:06 superdiaodiao

hi @snuyanzin @superdiaodiao do we need supports encoding args ? db2 https://www.ibm.com/docs/en/db2-for-zos/12?topic=functions-urlencode-urldecode max compute https://www.alibabacloud.com/help/en/maxcompute/user-guide/url-decode

liuyongvs avatar Jun 11 '24 05:06 liuyongvs

hi @snuyanzin @superdiaodiao do we need supports encoding args ? db2 https://www.ibm.com/docs/en/db2-for-zos/12?topic=functions-urlencode-urldecode max compute https://www.alibabacloud.com/help/en/maxcompute/user-guide/url-decode

Calcite, Spark, Presto and Doris also need only one arg, it is enough to handle cases and UTF-8 meets our need.

superdiaodiao avatar Jun 11 '24 05:06 superdiaodiao

Thanks for the contribution @superdiaodiao thanks for the review @HuangXingBo , @davidradl

it looks ok to me, I will test it a bit more and in case of succeed will merge it

@snuyanzin Please accept my apologies for the interruption. I was just wondering if there have been any updates on the progress of this PR. I greatly appreciate your help and understanding. Thank you.

superdiaodiao avatar Jun 17 '24 07:06 superdiaodiao

Yep, sorry, testing took a bit longer, than expected however it looks ok

db2 https://www.ibm.com/docs/en/db2-for-zos/12?topic=functions-urlencode-urldecode max compute https://www.alibabacloud.com/help/en/maxcompute/user-guide/url-decode

Thanks for bringing this I also looked at different vendors:it's hard to find those who supports these functions And among those who supports it there is no common approach regarding encodings like StarRocks https://docs.starrocks.io/docs/3.1/sql-reference/sql-functions/string-functions/url_encode/ Oracle DB itself doesn't support url_encode however Oracle Apex supports it without encoding

So I would go with current implementation as a first step For different encodings: also need to make a research what encodings should be supported. And how well it is supported in table module, e.g. I failed to find tests in table module

UPD: currently there is a feature freeze, let's wait until it will be ended

snuyanzin avatar Jun 17 '24 08:06 snuyanzin

Yep, sorry, testing took a bit longer, than expected however it looks ok ...... UPD: currently there is a feature freeze, let's wait until it will be ended

Hey, may I ask if there have been any updates in the past week? If you need help, just let me know as long as I can provide support. Thanks for your efforts until now.

superdiaodiao avatar Jun 23 '24 14:06 superdiaodiao

As it was mentioned earlier there is feature freeze currently https://lists.apache.org/thread/ftj0mcs2843yrx1tog9lfhfrbzkrbfvp so need to wait

snuyanzin avatar Jun 24 '24 06:06 snuyanzin

As it was mentioned earlier there is feature freeze currently https://lists.apache.org/thread/ftj0mcs2843yrx1tog9lfhfrbzkrbfvp so need to wait

OK, get your point. Thanks!

superdiaodiao avatar Jun 24 '24 07:06 superdiaodiao

@flinkbot run azure

superdiaodiao avatar Jun 27 '24 07:06 superdiaodiao

@lincoln-lil thanks for having a look please let us know if there is anything that should be fixed from your point of view

I'm asking since i'm going to merge it at the end of this/beginning of the next week

@superdiaodiao can you please rebase to the recent master's version, I think it will help with ci?

snuyanzin avatar Jul 04 '24 06:07 snuyanzin

@superdiaodiao can you please rebase to the recent master's version, I think it will help with ci?

OK, I will rebase.

Update: CI passed after rebasing.

superdiaodiao avatar Jul 04 '24 10:07 superdiaodiao