botocore icon indicating copy to clipboard operation
botocore copied to clipboard

`is_valid_endpoint_url` method consider endpoint with underscore character as invalid

Open urbancokova opened this issue 4 years ago • 6 comments

Describe the bug is_valid_endpoint_url method consider endpoint with underscore character _ as invalid

  • https://github.com/boto/botocore/blob/33c5abc38dd656f3697de706338ed7ccc6fff717/botocore/utils.py#L989

Steps to reproduce Our endpoint example: http://example_foo-bar - when we use this format of endpoint it was evaluated as invalid.

Expected behavior We think, endpoint is valid.

urbancokova avatar Jun 29 '21 11:06 urbancokova

Hi @urbancokova,

Thanks for reaching out! Looking into this further.

stobrien89 avatar Jun 30 '21 15:06 stobrien89

Hi @urbancokova,

Can you elaborate more on your use case for this? We made the decision some time ago to stick with RFC 952 with regard to validating host names:

The Internet standards (Requests for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters 'a' through 'z' (in a case-insensitive manner), the digits '0' through '9', and the hyphen ('-'). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or white space are permitted.

stobrien89 avatar Jun 30 '21 19:06 stobrien89

Hi stobrien89,

yes of course. We use Docker Swarm for container orchestration. Hostnames in our case are composed by Docker Swarm and looks like stack-name_service-name. We never before have any problems with our hostnames. Based on the quick search I also see, that RFC 952 is not supported as generally as you maybe expected (I found multiple cases, where it was not respected).

We already found one solution: usage of network aliases. But I still think that validation only based on RFC 952 is too strict.

urbancokova avatar Jul 01 '21 07:07 urbancokova

Hi @urbancokova,

Thanks for clarifying. I suspected Docker might be at play here. We've had this discussion regarding validation a few times over the past couple of years, namely in https://github.com/boto/botocore/pull/960 and I've seen a handful of people with similar use cases.

The team's stance at this point is to stick to the rfc, but I think it would be fair to mark this as a feature request for now. I can't guarantee it will stay that way, as I need to have a discussion with the team about the various use cases/customer impact surrounding this.

stobrien89 avatar Jul 02 '21 22:07 stobrien89

I ran into this as well and was a bit surprised. It sounds like it's a no op, but to be another voice...

Really, because I use underscores in all my docker hostnames, but i generally do so. It took an afternoon of debugging to figure this was the culprit as I've never had a problem with underscores. A proper error message would have gone a long way in this regard, though. TIL about the RFC which I could very well see the argument to constrain to it... but this seems a bit suffocating in this instance.

LyleScott avatar Aug 25 '21 02:08 LyleScott

I just got bit by this as well, and I wanted to chime in my two cents.

First of all, it sounds (from the other discussions) that the original reason this check was added was to avoid bug reports from cryptic error messages around inability to resolve the endpoint URL. It doesn't appear to me that the solution in the code actually solves that problem; you've just traded one set of github issues for another (witness numerous PRs and this issue).

I would also argue that the current error message is still plenty cryptic. When debugging this, I just kept getting an "invalid endpoint" ValueError on something I knew, for a fact, resolved correctly (because I had already verified the URL from within the docker container using CURL). To figure out what was actually going on, I had to go digging around the botocore source code to figure out what the problem was. And it wasn't until I played with the validation function, as well as read its implementation regex matcher, that I could definitively identify the problem -- and only then did I have enough keywords to discover the relevant issues/PRs here on github.

Second, it seems to me like the solution to "cryptic error messages" is to catch the errors where they happen, and reraise them with better messages. This is an extremely common pattern in the python world, but I'm not even sure that's possible within botocore, given all of the magic used behind the scenes to generate the API using its service descriptions. So from my perspective, it seems like botocore is probably backed into a corner solving the actual problem, and has instead opted to just punt on it and do whatever is most developer-hostile but results in the least maintenance work for the library. That's a legitimate choice but it needs, at a minimum, to be documented in the description of the endpoint_url parameter.

Additionally, there's some nuance here that hasn't really captured in the discussion about RFC 952. As outlined quite thoroughly in the answers to this SO question, though RFC 952 doesn't permit underscores in hostnames, underscores in domain names are perfectly valid. The endpoint_url parameter doesn't make the distinction between hostname and domain name, only the validation function does. So I actually don't agree that RFC 952 is appropriate here, because there's no separate way to specify valid domain names with invalid hostnames (domain name localization is another place where this is relevant). Plus, RFC-2181, RFC-1034, and RFC-1123 all say that applications should be robust against malformed values -- RFC-2181 actually requires it -- and such URLs very much do exist in the wild. So using an RFC as justification to restrict this to forbid underscores, when there are even more RFCs saying you shouldn't get dogmatic about this stuff, leaves a very sour taste in my mouth.

Badg avatar May 04 '22 12:05 Badg