aioelasticsearch icon indicating copy to clipboard operation
aioelasticsearch copied to clipboard

Prevent double-encoding of path components in queries.

Open ccma14 opened this issue 4 years ago • 0 comments

What do these changes do?

This change addresses an issue with double-encoding URL components when performing a request against elastic search. For example, this causes issues when retrieving an elastic search document by its id, and the ID contains a character that needs to be encoded:

    async with Elasticsearch("myserver.example") as es:
        result = await es.get(index="myindex", doc_type="_doc", id="1234+5678)

Without this patch a 404 error (document not found) is returned even when a document with the specified id exists in the specified index. -- Note that the equivalent query above works correctly when performing a synchronous request via elasticsearch.Elasticsearch.

The reason for this is:

  • Looking at elasticsearch.client.ElasticSearch, you'll see that every time before perform_request is invoked on the transport object, elasticsearch.client.utils._make_path is already used to url-encode the path argument (which comes down to using urllib.parse.quote).
  • Using aioelasticsearch these requests end up at aioelasticsearch.connection.AIOHTTPConnection.perform_request, where the full URL is built by adding the url argument (representing the aforementioned already encoded path component) to self.base_url (which is a yarl.URL) using the / operator.
  • This causes yarl.URL to url-encode the path component a second time.

This PR avoids the issue by

  • first constructing a relative yarl.URL instance for the path component to be appended, specifying encoded=True to avoid double url-encoding.
  • Then using URL.join to build the final URL rather than the / operator.

Are there changes in behavior for the user?

No (bugfix).

Related issue number

None

Checklist

  • [x] I think the code is well written
  • [x] Unit tests for the changes exist
  • [x] Documentation reflects the changes N/A
  • [ ] Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

Note The checklist item above regarding the CHANGES folder seems out of date, since I can't find such a folder. -- If you want me to add a blurb about this PR to CHANGES.rst, please let me know.

ccma14 avatar Jan 13 '21 22:01 ccma14