otp
otp copied to clipboard
gen_tcp:connect/3 raises an exception for empty string addresses
Describe the bug I was running some dynamic tests with HTTP requests using hackney and realized after some debugging that gen_tcp.connect with String address might respond in 3 different ways:
- Success tuple in case you set a valid URL like:
'something.com' - Error tuple in case you set a string but invalid URL like:
'something' - Raises an exception with badarg in case you set an empty string like:
''
the third response doesn't feel right, sometimes you might get a misconfiguration issue just like I had, and it will cost you some debugging time to understand what happened, while it could simply say {error, nxdomain} instead of raising an Exception badarg with no details.
I was digging into the gen_tcp:connect/3 and I found out that it happens because the inet_parse:visible_string/1 here
Steps to reproduce the behavior.
> gen_tcp:connect('something.com', 80, []).
{ok, #Port<...>}
> gen_tcp:connect('something', 80, []).
{error, nxdomain}
> gen_tcp:connect('', 80, [])
** exception exit: badarg
in function gen_tcp:connect/4 (gen_tcp.erl, line 171)
Expected behavior empty string also classifies as an invalid URL, I guess it should respond with an error tuple the same as below.
> gen_tcp:connect('something', 80, []).
{error, nxdomain}
> gen_tcp:connect('', 80, [])
{error, nxdomain}
Affected versions I'm using Erlang/OTP 24, but I believe its happening not only on this version.
Additional context I'd like to understand better if this is the best approach of dealing with an empty string, but I believe that returning an error tuple might be more friendly.
If so, can I work on a PR for that?
I guess that we could simply accept an empty list/string at inet:getaddrs_tm/3 and dispatch inet:gethostbyname_tm/3 to return {error, nxdomain}
@WLSF The use of {error, nxdomain} appears to be very overloaded. It would be best if {error, nxdomain} only occurred when a DNS lookup occurs successfully, but fails to lookup the domain name. The other problem to consider, that may be partially related to why {error, nxdomain} is likely overloaded, is due to inet:getaddrs/3 failing to timeout, so no DNS lookups are able to timeout currently. inet:getaddrs/3 remains undocumented, so I would assume that area of Erlang/OTP still needs work.
@okeuday that's definitely interesting to know, thanks for sharing your thoughts.
But I'm still thinking about whether it should raise an Exception or return an error tuple. I believe that since nxdomain is a response for 'invalid_url' it should be the same for the empty string, even if we dont use nxdomain in this case, at least it feels like returning an error tuple would be a better fit for this case.
What do you think?
@WLSF It is best to have {error, nxdomain} as a response to a function call only when a DNS lookup occurred successfully but the DNS lookup response is a NXDOMAIN message type showing that the lookup failed. The exception in this situation is due to providing invalid input to the function. There is no way an empty string is valid for a connection address and an exception is pursuing fail-fast behavior, ensuring the error that caused the empty string is fixed. An error exception gives a stacktrace which is what you want for a function call with a parameter that is invalid.
I see, I guess it make sense.
I'm still confused about the badarg thing because it seems like '' and 'wrong' are both strings, both invalid URLs addresses, doesnt feel like it should respond differently.
well, feel free to close my issue.
Thanks for the answers @okeuday!
We think the spec should be updated to not allow an empty list.
@IngelaAndin I was taking a deeper look into how httpc works with these issues, and it has a very interesting way of.
If you set anything that might not be a http or https valid request, it will respond back with {:error, {:no_scheme}}
1> httpc:request(get, {"", []}, [], []).
{error, {no_scheme}}
2> httpc:request(get, {"something", []}, [], []).
{error, {no_scheme}}
3> httpc:request(get, {"something.com", []}, [], []).
{error, {no_scheme}}
4> httpc:request(get, {"https://something.com", []}, [], []).
{ok,
{{'HTTP/1.1', 200, 'OK'},
...}
It treats "" and "something" the same way, as invalid URLs, which seems to make sense, because they are the same thing.
Right now for hackney, it will simply crash with a stack trace that doesn't really make sense at all for the application, maybe it should be a new issue for them as well.
I have read up on the issue of how to handle an empty host name, and reached a personal opinion.
In a DNS query the request QNAME is a sequence of labels where each label is stated to be a nonempty string, terminated by the empty label that designates the root domain ".". I do not find anything that states that the number of non-empty labels has to be greater than zero. This suggests that the empty host name could be interpreted as the root domain.
Some tests on my Ubuntu machine:
dig '' prints: '' is not a legal name (unexpected end of input), and returns non-zero (10).
dig . prints the SOA record for the root domain (.) and returns zero.
getent hosts '' returns non-zero (2).
getent hosts . does exactly the same.
dig erlang.org and dig erlang.org. (note the trailing dot) both do the same thing.
So dig interprets the given host name as a fully qualified name meaning that a trailing dot is not significant and implied, except for the root domain where '' and . are not treated equivalently, but could be, which would feel more homogeneous. For dig you will have to resort to quoting to pass it an empty host name so they have greater leeway to treat the empty string differently. In Erlang you will always have to quote a string.
For a host name in Erlang/OTP's standard libraries, it is possible to use a string() | atom() as a host name, today. Changing that to non_empty_string() | atom() still allows for the empty atom '' which we would need to handle in runtime since the type system cannot specify non_empty_atom() or atom() - ''.
inet_res:lookup/* does pretty much as dig in that it returns {error,formerr} for the empty host name, because inet_parse:visible_string("") returns false, which is what makes gen_tcp:connect/* crash. For non-empty strings inet_res:lookup/* treats them as equivalent to having a trailing dot.
I think we should change consistently so the empty host name is treated as a request for the root domain, just as if a trailing dot had been there, and do the lookup. Trying with dig now there is no A record for the root domain, so that would, I guess, result in {error, nxdomain}, but the lookup is possible to do and should be done.