wttr.in icon indicating copy to clipboard operation
wttr.in copied to clipboard

JSON `nearest_area` is one city over

Open pragma- opened this issue 3 years ago • 21 comments

Todo

  • [ ] Add queried_location to JSON response

Details

A while ago I submitted a PR to add includeLocation to the weather query in order to get nearest_area in the JSON result. This had been working flawlessly and fantastically up until about a month ago or so.

Now the areaName field is consistently one or two cities away instead of the city being queried. For example, los angeles gives results for hillgrove, california. new york city gives oakland gardens, new york. london, uk gives cubitt town, tower hamlets, greater london, united kingdom. A month ago all of these used to return the expected city.

This is disconcerting because I cannot tell if WWO (wttr's weather backend) simply no longer has access to weather stations in those cities or if its database look-up is broken in some way or if the nearest_area has been changed to do something else now.

Do you know why this is happening?

By the way, I was looking over the PRs and I saw that somebody had recently submitted a PR to use an IP-location service because the WWO location results have been very inaccurate lately. Instead, I hope we can resolve the WWO location result issue to be accurate once again.

pragma- avatar Sep 02 '20 18:09 pragma-

I can confirm this behaviour. Searching for some larger German cities like München returns Strasslach-Dingharting, some village in the vague surrounding area, Karlsruhe returns Neuburg Am Rhein, same thing.

ScientiaEtVeritas avatar Sep 26 '20 15:09 ScientiaEtVeritas

Yes, I can confirm this too. I think that the problem is that how WWO handles it. I see that it is very inaccurate now. I should report this problem to them, and if would not be fixed/mitigated we should probably look for a new data provider.

Internal wttr.in location resolution works perfectly well:

$ curl http://localhost:8004/Nuremberg
{"latitude": 49.453872, "timezone": "Europe/Berlin", "longitude": 11.077298, "address": "Nürnberg, Mittelfranken, Bayern, Deutschland"}
$ curl http://localhost:8004/49.45,11.08
{"address": "11, Theatergasse, Altstadt, St. Lorenz, Nürnberg, Bayern, 90402, Deutschland", "latitude": 49.4498919, "longitude": 11.0801459, "timezone": "Europe/Berlin"}

But at the same time:

$ curl wttr.in/Nuremberg?format=j1 | jq .nearest_area[0]
{
  "areaName": [
    {
      "value": "Aurau"
    }
  ],
  "country": [
    {
      "value": "Germany"
    }
  ],
  "latitude": "49.250",
  "longitude": "11.017",
  "population": "0",
  "region": [
    {
      "value": "Bayern"
    }
  ],
  "weatherUrl": [
    {
      "value": ""
    }
  ]
}

We can override the nearest_area field of WWO with the wttr.in data, but the real question is that perhaps WWO returns the data for the nearest_area instead of the area in the query (which would be really bad)

chubin avatar Sep 29 '20 20:09 chubin

We can override the nearest_area field of WWO with the wttr.in data, but the real question is that perhaps WWO returns the data for the nearest_area instead of the area in the query (which would be really bad)

It does look like the weather data is indeed "accurate" for the nearest_area. The problem is that it's not the location we searched for.

pragma- avatar Sep 29 '20 23:09 pragma-

@pragma- I think the only real solution for this problem is to add support for other upstream data sources. We have initial support of a new data source in #532; I believe more will follow; then we will have a robust solution, and until that we will be always dependent on the single data source

chubin avatar Oct 14 '20 05:10 chubin

Could this be also/additionally due to a service rounding coordinates?

When I use http://wttr.in/51.4976,20-0.1181 (central London), I get the following search result: Ort: Lambeth Palace Garden, Lambeth Palace Road, Lambeth, London Borough of Lambeth, London, Greater London, England, SE1 7JU, United Kingdom [51.49704725,-0.11875235545073382]

If I do the same search with JSON format like this http://wttr.in/51.4976,%20-0.1181?format=j1, I do get a different output:

grafik

Mark the request coordinates being only two decimals.

Danfro avatar Nov 23 '20 20:11 Danfro

I use the forecast module with Bodhi Linux which uses this as a backend, and have the same issue. If I set it to San Jose, California it comes up with Coyote, someplace in the remote surrounding area. I tried entering other various city names around me and Cupertino came up with Austin which is a little bit closer, but no way to get it to actual San Jose that I have found.
Ideally I would enter a postal/zip code.... but even if I had to enter latitutde/longtitude that would be fine... but city name is not working quite right.

Both of those locations (Coyote or Austin) are small obscure places I had not heard of, and had to use google maps to even find them. I reported to Bodhi developers but they pointed me here as an upstream problem, and seems it is affecting others in similar manner, when I read "village in remote surrounding area" for the user near Munich I thought to myself "yep, exactly!".

enigma9o7 avatar Dec 03 '20 18:12 enigma9o7

I've localized the bug pretty well now. As I already wrote before, it is in the data source. I hope they will fix it, because it is a real bug, affecting all their (commercial) customers. If they will not fix it, I have an idea of a workaround, and if it will not help either, the only solution will be to change the data source.

Just for the clarity: it is not a bug in wttr.in!

chubin avatar Feb 28 '21 15:02 chubin

@pragma- @ScientiaEtVeritas @Danfro @enigma9o7

I believe it is fixed now. Could please check if it works for you?

chubin avatar Mar 01 '21 15:03 chubin

@chubin It doesn't seem fixed for me. I'm using this endpoint: http://wttr.in/Karlsruhe?format=j1. Thank you for looking into this issue!

ScientiaEtVeritas avatar Mar 01 '21 15:03 ScientiaEtVeritas

@ScientiaEtVeritas Doch,

at least it seems to work for me (with Karlsruhe too):

$ curl -ks wttr.in/Karlsruhe?format=j1\&nonce=$RANDOM | jq -r .nearest_area[0].areaName[0].value
Carlsruhe

I added here nonce=, to bypass the caching layer (shouldn't be done usually, because it generates additional useless load, but ok in this case; as soon as the cache entries are expired, it will be not needed here too)

chubin avatar Mar 01 '21 16:03 chubin

@pragma- @ScientiaEtVeritas @Danfro @enigma9o7

I believe it is fixed now. Could please check if it works for you?

It does! This is excellent! Thank you so much!

enigma9o7 avatar Mar 02 '21 01:03 enigma9o7

I think the bug is fixed; let's wait for at least one additional acknowledgment (@ScientiaEtVeritas from Fabian maybe?) and close it

chubin avatar Mar 02 '21 17:03 chubin

Please ignore me if I just don't remember a detail of how different result formats work. But searching for say Leipzig using general search returns Leipzig as result. Fine. But using json format does return Stunz, a part of Leipzig. Is that intended? Should both return the same result = Leipzig?

Please compare those two querys:

http://wttr.in/leipzig?format=j1

http://wttr.in/leipzig

Doing the same for München returns München and Gern (a part of München).

Danfro avatar Mar 02 '21 21:03 Danfro

Yes, that's true, but the discrepancy shouldn't be too big (if at all). There are some locations indeed (Leipzig is one of them) where reverse GPS resolution (GPS -> Name) returns a little bit different result than the direct resolution (Name -> GPS). As far as I can understand, this comes from the caching mechanisms that are used on the data source side; we can't influence it directly.

As long as it is only slightly off, I think the error can be ignored. It it will influence the forecast results, we will need to search for some solution

chubin avatar Apr 04 '21 17:04 chubin

@chubin The nearest_area field does appear to be now be populated with more-accurate values, for the most part.

Previously, I was always getting city names that were one city away or so (e.g. "los angeles, california" would display "hillgrove, california"). Every time, consistently. Now I get the expected city information most of the time.

There are still some queries that do not have the expected city name; i.e. "Manhattan, New York" gives "Clason Point, New York" -- which seems to be just slightly outside of Manhattan, according to Google Maps. "Bronx, New York" gives "West Farms, New York".

It is my understanding that the data source gets information about the nearest weather station to a query. It may not always be possible to have a weather station in the exact location. That could explain why it says "Clason Point" and "West Farms" instead of the queried city name.

As long as the nearest_area field is accurately representing the correct weather station, I am fine with discrepancy between the queried location name and the weather results location name. As far as I can tell, the nearest_area field is much less broken now. The New York results make me hesitate on saying that it is 100% fixed.

pragma- avatar Apr 05 '21 01:04 pragma-

Noticed something weird.

If I query for "Bronx" I get "Baychester, New York" with a Lat/Long of 40.86 and -73.84.

If I query for "Bronx, New York" I get "West Farms, New York" with a Lat/Long of 40.85 and -73.88.

Do you know why this happens? I would expect "Bronx" and "Bronx, New York" to both use the same weather station.

pragma- avatar Apr 05 '21 01:04 pragma-

Yes, it happens because that is how the location resolution procedure works:

  • Bronx => Bronx County, NYC, New York, United States of America [40.85703325,-73.8366961598775]
  • Bronx,New York => The Bronx, Bronx County, New York, United States [40.8466508,-73.8785937]

You can query any other location, and check how it will be resolved, like this:

$ curl wttr.in/~Bronx,New+York | grep ^Location:

This problem (if it is a problem) is not related to the original one, and it is not related to weather data, it happens one step earlier. That's just like geo location system works, and I don't see here a big problem. The same could happen if you would search for a location in Google Maps or Apple Maps or whreever.

The original problem was a real problem though. It is not really because of weather station locations, because the data of the stations is getting postprocessed, interpolated etc, but it is still a bug (or caching issue) on the data source level. We can't influence it directly, but as I said, if the problem (at its older scale) reoccurs, we will search for some solution

chubin avatar Apr 05 '21 09:04 chubin

You can query any other location, and check how it will be resolved, like this: $ curl wttr.in/~Bronx,New+York | grep ^Location:

This indeed does say "The Bronx, Bronx County, New York" as expected! This is what I was expecting the nearest_area field to accomplish.

Instead, today, using curl wttr.in/~Bronx,New+York?format=j1, we have yet another new location name for "Bronx, New York"! It is now saying "Morrisania, New York". I cannot use the nearest_area field to display the names of the locations because they seem to be confusing and inconsistent locations: Baychester, West Farms, Morrisania.

The nearest_area field does seem to be much more accurate now, but it does not give a consistent location name for some locations. Would it be possible to add a location field to the JSON (format=j1) results that will use the Location: data from the "normal" results (curl wttr.in/~Bronx,New+York | grep ^Location:)?

pragma- avatar Apr 05 '21 15:04 pragma-

Yes, it is a good idea; probably we should just add something like queried_location to the JSON response; keep in mind though that the data is provided for the Lat/Long pair in the response, not the lat/long pair in the query! I understand that it sound strange, but that's how the caching of our data provider works, and it does not look like that they are going to fix it. Ans as I said, the shift is not so big now, much better than before

chubin avatar Apr 05 '21 16:04 chubin

queried_location sounds great. Should I go ahead and close this issue and open a new issue for queried_location or do you want to keep this one open?

pragma- avatar Apr 05 '21 17:04 pragma-

No, you shouldn't; I a going to work on it as a part of this issue. I already extended the original description with this step

chubin avatar Apr 06 '21 05:04 chubin