opensearch-ruby
opensearch-ruby copied to clipboard
[BUG] Client doesn't handle basic auth passwords with special characters in url correctly
What is the bug?
It seems like OpenSearch::Client
doesn't handle hosts/urls that contain passwords in the url with special characters correctly. When making requests with one of these urls, I get an "Unauthorized" error. I believe the problem is that __parse_host
doesn't decode the special characters.
How can one reproduce the bug? Example code snippet:
client = OpenSearch::Client.new(
url: "https://user:pass%7D%[email protected]:15000",
transport_options: { ssl: { verify: false } }
)
client.search(index: '_all', body: {})
I can however get around this problem by passing a hash in the url parameter that does the decoding myself:
url = URI("https://user:pass%7D%[email protected]:15000")
client = OpenSearch::Client.new(
url: {
scheme: url.scheme,
user: CGI.unescape(url.user),
password: CGI.unescape(url.password),
host: url.host,
path: url.path,
port: url.port
},
transport_options: { ssl: { verify: false } }
)
client.search(index: '_all', body: {})
What is the expected behavior? I expect that passwords embedded in the url with standard url encoding should not cause authentication errors
What is your host/environment?
- OS: macOS 12.3.1
- Version: 1.0.0
Let me take a look on this. Will get back.
The password pass%7D%7Dword
is used as is (without decoding). @NoahCarnahan you mean to say the decoding should be happening here ?
Correct, I believe that the library should decode the password before trying to use it for authentication.
I realize I left a detail out of my initial bug report:
The password for this user is pass}}word
. }
is a special character, therefore it must be encoded to be included in the basic auth section of the url. As per url encoding rules, }
is replaced by %7D
, which gives us a url of https://user:pass%7D%[email protected]:15000
in this example.
If I were to try specifying the password without URL encoding like https://user:pass}}[email protected]:15000
, then an exception would be thrown about an improperly formatted url.
Additionally, I can use curl
with the https://user:pass%7D%[email protected]:15000
version of the url to successfully access my AWS OpenSearch cluster.
Ruby default parser doesn't seem to want to decode the password.
2.6.5 :009 > parsed = URI.parse('https://user:pass%7D%[email protected]:15000')
=> #<URI::HTTPS https://user:pass%7D%[email protected]:15000>
2.6.5 :010 > parsed.password
=> "pass%7D%7Dword"
The RFC says "Within the user and password field, any ":", "@", or "/" must be encoded."
> URI.parse('https://user:pass%[email protected]:15000').password
=> "pass%40word"
So this doesn't decode the password either.
It seems like in most libraries unencoding is left to the caller, e.g. in addressable:
require 'addressable'
Addressable::URI.unencode('https://user:pass%[email protected]:15000/%40%40')
=> "https://user:pass@[email protected]:15000/@@"
Calling decode first would be a breaking change. I vote to leave things as is.
Note that using URLs with username/password in them is deprecated in the RFC, because this will log the user's password. That's obviously a bad security practice, so I would get rid of any code that passes username/password into the URL.
Agree with @dblock , this will be breaking change. We should leave the decoding to caller.
@NoahCarnahan Do you feel strongly otherwise? Otherwise let's close this?
I agree that the default ruby parser doesn’t do any decoding, but I think it’s the responsibility of opensearch-ruby
as a user of URI
to do the decoding. As a point of comparison, the opensearch-py
library handles URLs with encoded characters just fine (encoded characters in the password part of the url don’t prevent the library from authenticating with the server).
I recognize that passing username and password in the url isn’t ideal, but I do wonder if this library also can’t handle escaped characters in the path part of the url, which I imagine could come up with certain index names (I haven’t tested this).
Ultimately, I’m not a contributor to this project, so I’m not going to push too hard for the change, but I do feel that the current behavior of the library doesn’t handle all valid RFC 1738 urls correctly, which seems like a bug to me.
@NoahCarnahan If you have some spare time, add a few RSpec tests and document current behavior (e.g. a test that checks that encoded path part is decoded)? This way we will know what we are changing, if we are and it will help the next person.
It looks like the password part is as per the RFC and is also a deprecated practice. I propose we close this issue and if anyone finds an issue with the url parsing, we can create a new issue.
I think we should implement some specs and document this behavior before we close it.