janeway icon indicating copy to clipboard operation
janeway copied to clipboard

Redundant percent-encoding when building url query string

Open joemull opened this issue 8 months ago • 0 comments

Problem

The dictionary that is passed into utils.logic.build_url is sent through urllib.parse.urlencode and then also through urllib.parse.quote_plus.

https://github.com/BirkbeckCTP/janeway/blob/92517963a92a173173a58244520be0cfe019bef8/src/utils/logic.py#L153-L154

This is redundant and results in unusable query strings. Compare:

In [3]: quote_plus(urlencode({'a': 'b', 'x': 'y', 'next': '/target/page/?d=e&f=g'}))
Out[3]: 'a%3Db%26x%3Dy%26next%3D%252Ftarget%252Fpage%252F%253Fd%253De%2526f%253Dg'
In [4]: urlencode({'a': 'b', 'x': 'y', 'next': '/target/page/?d=e&f=g'})
Out[4]: 'a=b&x=y&next=%2Ftarget%2Fpage%2F%3Fd%3De%26f%3Dg'

I believe we just want urlencode here.

If a URL is needed for inclusion in another URL, as a redirect callback, like with the ORCID API, then the function that builds the ORCID URL (utils.orcid.build_redirect_uri) should separately quote or escape the full URL that is returned from build_url.

Proposed solution

Allow a Django QueryDict, a plain dict, or a query string that already has percent-encoded values to be passed to the function. URL-encode the first two with '/' marked safe, to match the default behavior of the |urlencode template filter. For query strings, do nothing except pass it on to SplitResult.

joemull avatar Jun 24 '24 09:06 joemull