opensearch-ruby icon indicating copy to clipboard operation
opensearch-ruby copied to clipboard

[BUG] Client doesn't handle basic auth passwords with special characters in url correctly

Open NoahCarnahan opened this issue 2 years ago • 10 comments

What is the bug? It seems like OpenSearch::Client doesn't handle hosts/urls that contain passwords in the url with special characters correctly. When making requests with one of these urls, I get an "Unauthorized" error. I believe the problem is that __parse_host doesn't decode the special characters.

How can one reproduce the bug? Example code snippet:

client = OpenSearch::Client.new(
    url:  "https://user:pass%7D%[email protected]:15000",
    transport_options: { ssl: { verify: false } }
)
client.search(index: '_all', body: {})

I can however get around this problem by passing a hash in the url parameter that does the decoding myself:

url = URI("https://user:pass%7D%[email protected]:15000")
client = OpenSearch::Client.new(
  url: {
    scheme: url.scheme,
    user: CGI.unescape(url.user),
    password: CGI.unescape(url.password),
    host: url.host,
    path: url.path,
    port: url.port
  },
  transport_options: { ssl: { verify: false } }
)
client.search(index: '_all', body: {})

What is the expected behavior? I expect that passwords embedded in the url with standard url encoding should not cause authentication errors

What is your host/environment?

  • OS: macOS 12.3.1
  • Version: 1.0.0

NoahCarnahan avatar Apr 13 '22 16:04 NoahCarnahan

Let me take a look on this. Will get back.

jayeshathila avatar Apr 14 '22 06:04 jayeshathila

The password pass%7D%7Dword is used as is (without decoding). @NoahCarnahan you mean to say the decoding should be happening here ?

jayeshathila avatar Apr 15 '22 06:04 jayeshathila

Correct, I believe that the library should decode the password before trying to use it for authentication.

I realize I left a detail out of my initial bug report: The password for this user is pass}}word. } is a special character, therefore it must be encoded to be included in the basic auth section of the url. As per url encoding rules, } is replaced by %7D, which gives us a url of https://user:pass%7D%[email protected]:15000 in this example.

If I were to try specifying the password without URL encoding like https://user:pass}}[email protected]:15000, then an exception would be thrown about an improperly formatted url.

Additionally, I can use curl with the https://user:pass%7D%[email protected]:15000 version of the url to successfully access my AWS OpenSearch cluster.

NoahCarnahan avatar Apr 15 '22 13:04 NoahCarnahan

Ruby default parser doesn't seem to want to decode the password.

2.6.5 :009 > parsed = URI.parse('https://user:pass%7D%[email protected]:15000')
 => #<URI::HTTPS https://user:pass%7D%[email protected]:15000> 
2.6.5 :010 > parsed.password
 => "pass%7D%7Dword" 

The RFC says "Within the user and password field, any ":", "@", or "/" must be encoded."

> URI.parse('https://user:pass%[email protected]:15000').password
 => "pass%40word" 

So this doesn't decode the password either.

It seems like in most libraries unencoding is left to the caller, e.g. in addressable:

require 'addressable'
Addressable::URI.unencode('https://user:pass%[email protected]:15000/%40%40')
 => "https://user:pass@[email protected]:15000/@@" 

Calling decode first would be a breaking change. I vote to leave things as is.

Note that using URLs with username/password in them is deprecated in the RFC, because this will log the user's password. That's obviously a bad security practice, so I would get rid of any code that passes username/password into the URL.

dblock avatar Apr 19 '22 21:04 dblock

Agree with @dblock , this will be breaking change. We should leave the decoding to caller.

jayeshathila avatar Apr 20 '22 03:04 jayeshathila

@NoahCarnahan Do you feel strongly otherwise? Otherwise let's close this?

dblock avatar Apr 20 '22 15:04 dblock

I agree that the default ruby parser doesn’t do any decoding, but I think it’s the responsibility of opensearch-ruby as a user of URI to do the decoding. As a point of comparison, the opensearch-py library handles URLs with encoded characters just fine (encoded characters in the password part of the url don’t prevent the library from authenticating with the server).

I recognize that passing username and password in the url isn’t ideal, but I do wonder if this library also can’t handle escaped characters in the path part of the url, which I imagine could come up with certain index names (I haven’t tested this).

Ultimately, I’m not a contributor to this project, so I’m not going to push too hard for the change, but I do feel that the current behavior of the library doesn’t handle all valid RFC 1738 urls correctly, which seems like a bug to me.

NoahCarnahan avatar Apr 20 '22 21:04 NoahCarnahan

@NoahCarnahan If you have some spare time, add a few RSpec tests and document current behavior (e.g. a test that checks that encoded path part is decoded)? This way we will know what we are changing, if we are and it will help the next person.

dblock avatar Apr 20 '22 22:04 dblock

It looks like the password part is as per the RFC and is also a deprecated practice. I propose we close this issue and if anyone finds an issue with the url parsing, we can create a new issue.

wbeckler avatar Sep 27 '22 20:09 wbeckler

I think we should implement some specs and document this behavior before we close it.

dblock avatar Sep 28 '22 22:09 dblock