ngsi-timeseries-api icon indicating copy to clipboard operation
ngsi-timeseries-api copied to clipboard

http:// protocol should be automatically inferred for URL query parameters

Open michaeI-s opened this issue 5 years ago • 7 comments

Hi,

when POSTing to the /subscribe endpoint I get a "400 Bad Request" response, if the 'orionUrl' query parameter does not begin with 'http://'.

In my case I tried with

curl -X POST \
  'http://localhost:8668/v2/subscribe?orionUrl=orion-v2:1026/v2&quantumleapUrl=quantumleap:8668/v2&idPattern=ParkingSpot&entityType=ParkingSpot&attributes=status&timeIndexAttribute=dateModified' \
  -H 'Fiware-Service: test' \
  -H 'Fiware-Servicepath: /'

and got

{
  "description": "Orion is not reachable by QuantumLeap at orion-v2:1026/v2",
  "error": "Bad Request"
}
  1. I think the leading 'http://' should be automatically inferred when not given. It took me some time to figure out what the problem was...
  2. It should also return a 412 status code instead for an unreachable Orion URL as stated by the documentation.

Strangely enough, if the 'quantumleapUrl' query parameter does not begin with 'http://' the behavior is a little different, although it should be consistent.

The response will be like

"{\"error\":\"BadRequest\",\"description\":\"invalid custom /url/\"}"

I'm testing with Orion v2.2.0, QuantumLeap v0.7.5 each in a single docker container on a Linux Mint 19.2

Regards, Michael

michaeI-s avatar Nov 08 '19 13:11 michaeI-s

Hi @MS-Fiware,

leading 'http://' should be automatically inferred when not given

Interesting point. While this could be convenient to some, it might catch others off guard. Why should the scheme default to http? To me https is equally viable. So which one should we use? I reckon whichever way you look at it we lose, if we default it, we'll always risk making someone unhappy? Also, if my memory serves me well, valid URLs must begin with a scheme component---e.g. orion-v2:1026/v2 isn't a valid URL whereas http://orion-v2:1026/v2 is. If that's the case, I reckon QL should actually reject malformed URLs...But then again as you point out, I too definitely see the convenience of a sane default, it's just that I haven't got the faintest what that may be.

return a 412 status code

Ha! Good catch. I've never come across that myself. But isn't 412 supposed to be returned when a precondition on a HTTP header fails? What goes wrong here is that the Orion URL in the query string is malformed... I've got a sneaky suspicion we should double-check the HTTP spec since 412 may not be an appropriate code to use anyway?

"description": "Orion is not reachable by QuantumLeap at orion-v2:1026/v2"

Now this error message is cr@*. It's soooo misleading. What I'd get from the message is that QL can't get to Orion perhaps b/c of a firewall or some other network issue. It wouldn't occur to me the reason is that QL tried to do a POST using a malformed URL. I share your frustration here.

Strangely enough, if the 'quantumleapUrl' query parameter does not begin with 'http...

Probably there's some URL validation in place in that spot so that a malformed URL gets ditched, which is a good thing if we decide to stick to using only valid URLs.

So here's my suggestion:

  1. Only allow valid URLs for the quantumleapUrl and orionUrl query params.
  2. If URL validation fails, return a 400 (or better code?) with a message explaining why validation failed, e.g. malformed orionUrl; orion-v2:1026/v2 isn't a valid URL: missing schema component.

Is this reasonable?

c0c0n3 avatar Nov 08 '19 17:11 c0c0n3

H @c0c0n3,

Why should the scheme default to http? To me https is equally viable. So which one should we use?

I probably should have talked about http(s) ;). https is also viable to me. But http as the base / non SSL-variant seems more appropriate to me. If a host only accepts https connections, or more general, if the used protocol schema is not supported by the host, its response code might be conveyed by the QL response message. This way one get a feedback of what could have gone wrong but does not need to know the implementation details of the server. So actually I see no risk of making someone unhappy. For all the other cases where just http is applicable or users are too lazy writing scheme components, we could default to the http protocol. What do you think?

I've got a sneaky suspicion we should double-check the HTTP spec since 412 may not be an appropriate code to use anyway?

I also think that status code 400 is most likely to apply here and it should be used. Regarding this, I really like your suggestion to provide a good descriptive message like malformed orionUrl; orion-v2:1026/v2 isn't a valid URL: missing schema component. :)

But I can understand the idea behind status code 412 as well. The server wants to inform you that a precondition for further processing of the request has not been fulfilled. In this case it means a valid specification of a protocol in order to be able to make a request to the given host at all. However, the exact reason is not really apparent from the response message and more confusing.

Probably there's some URL validation in place in that spot so that a malformed URL gets ditched...

That's ok, but shouldn't it emit a message with the same format that is used when indicating a malformed Orion URL?

Best regards, Michael

michaeI-s avatar Nov 12 '19 16:11 michaeI-s

@MS-Fiware thanks for your comments :-)

TL;DR: I don't think there's any quick way to implement robust URL scheme defaults, but better error messages should be doable. Gory details below :-)

Using a default URL scheme

Thinking of it, yes, I agree w/ you that if we had to pick a default, probably http is what most devs would expect. But here's the party pooper. I'd like to have both scheme defaults and parameter validation---so we can return meaningful error messages. I checked the URI spec, and a URL, to be valid, has to have a scheme component, so orion/v2 isn't valid. Now I initially thought I'd be fairly easy to slap an http:// in front of it, but on a second thought I realised the only sane way to do that is to define a grammar and roll out your own parser:

url_param = URL | X

where X is something we'll have to define ourselves. Then we could parse the value in the query string and if it's an X, prefix it with http. Easier said than done. To appreciate the intricacies of this, look at how Python urllib parses the following "incomplete" URLs:

>>> from urllib.parse import *

>>> urlparse('orion/v2', scheme='http')  # use http if no scheme given
ParseResult(scheme='http', netloc='', path='orion/v2', params='', query='', fragment='')
# ouch! the host becomes part of the path!

>>> urlparse('orion-v2:1026/v2', scheme='http')
ParseResult(scheme='orion-v2', netloc='', path='1026/v2', params='', query='', fragment='')
# now the host becomes the scheme...

Another way to look at it, is that to implement URL validation we could first check if the string starts with http, if it doesn't prefix it with http, then use e.g. urllib to parse. In other words, we'd be considering the grammar:

url_param = scheme, rest
scheme = 'http' | 'https'
rest = ... as in the URI spec

Either way, my gut tells me this could be a fair amount of work. So I suggest we stick to requiring the client to specify a valid URL for the quantumleapUrl and orionUrl query params. You may argue we should actually implement proper URL validation with scheme defaults and in normal circumstances I'd agree with you, but at the moment we've got no dev cycles to spare. Of course, if this is an important feature to you and have the time to contribute the implementation (code, comprehensive test cases, spec updates, etc.) then by all means go ahead and open a PR, we'll gladly review it and integrate it into the mainline :-)

Fixing error messages

So I checked the HTTP spec too. Turns out 412 should only be used when a precondition fails on a request header, so it's not applicable to our case since we're dealing with the URL query string here. It looks like the most appropriate code would be 400? Assuming we go for 400, we still need to fix the error messages QL returns as you initially said and yes we should do it consistently for both quantumleapUrl and orionUrl---thank you for pointing this out :-)

Action plan

I'd suggest the following if you agree:

  1. Implement URL validation for quantumleapUrl and orionUrl query params. If the URL is malformed, return a 400 with a clear enough message why validation failed. Validation logic and error messages should be the same in both cases.
  2. Catch Orion connection errors resulting from a valid but unusable URL (e.g. http://orion/moved/somewhere/else. In this case too, return a 400 with a descriptive message, e.g. "Orion cannot be reached at: ..."
  3. Similarly, catch any other errors while setting up a subscription with Orion. Return suitable error codes and descriptive messages---e.g. 400: "Orion rejected subscription b/c ..."
  4. [Optionally if you can contribute it]. Make URL scheme default to http, ensuring we've covered all the bases and pesky corner cases.

Alternative take

I can't help thinking this subscribe endpoint is doing QL more harm than good. I can see the utility of it in certain cases but isn't it easier to just have clients set up a subscription directly with Orion as it happens for other FIWare components? How about deprecating the subscribe endpoint? If anyone else has been following this issue, I'd love to hear what your opinion is, i.e. is the subscribe endpoint really useful to you?

c0c0n3 avatar Nov 12 '19 19:11 c0c0n3

this is a side feature in ql, and previously I already questioned its utility given that it's just a duplicate of what orion does, and its api is not well formed. i would remove the feature actually.

chicco785 avatar Nov 13 '19 07:11 chicco785

Stale issue message

github-actions[bot] avatar Mar 10 '21 02:03 github-actions[bot]

we're going to close this issue when we drop support for the subscribe endpoint.

c0c0n3 avatar Mar 17 '21 08:03 c0c0n3

Stale issue message

github-actions[bot] avatar May 23 '21 02:05 github-actions[bot]