janeway
janeway copied to clipboard
Redundant percent-encoding when building url query string
Problem
The dictionary that is passed into utils.logic.build_url
is sent through urllib.parse.urlencode
and then also through urllib.parse.quote_plus
.
https://github.com/BirkbeckCTP/janeway/blob/92517963a92a173173a58244520be0cfe019bef8/src/utils/logic.py#L153-L154
This is redundant and results in unusable query strings. Compare:
In [3]: quote_plus(urlencode({'a': 'b', 'x': 'y', 'next': '/target/page/?d=e&f=g'}))
Out[3]: 'a%3Db%26x%3Dy%26next%3D%252Ftarget%252Fpage%252F%253Fd%253De%2526f%253Dg'
In [4]: urlencode({'a': 'b', 'x': 'y', 'next': '/target/page/?d=e&f=g'})
Out[4]: 'a=b&x=y&next=%2Ftarget%2Fpage%2F%3Fd%3De%26f%3Dg'
I believe we just want urlencode
here.
If a URL is needed for inclusion in another URL, as a redirect callback, like with the ORCID API, then the function that builds the ORCID URL (utils.orcid.build_redirect_uri
) should separately quote or escape the full URL that is returned from build_url
.
Proposed solution
Allow a Django QueryDict
, a plain dict, or a query string that already has percent-encoded values to be passed to the function. URL-encode the first two with '/' marked safe, to match the default behavior of the |urlencode
template filter. For query strings, do nothing except pass it on to SplitResult
.