pytzwhere icon indicating copy to clipboard operation
pytzwhere copied to clipboard

Potentially wrong results returned

Open jannikmi opened this issue 8 years ago • 17 comments

I wrote a package using the same data and compared my results to the results of this package for aprox. 5M points. Those are a few of the points where I got mismatches:

(lng, lat), result timezonefinder, result pytzwhere (-110.58195555806134, 35.53178795327783), 'America/Phoenix', 'America/Denver'] (-110.41118747093144, 35.76969036563554), 'America/Phoenix', 'America/Denver'] (-110.69776990691068, 35.59283347945522), 'America/Phoenix', 'America/Denver'] (-110.91327628315915, 36.11131540134208), 'America/Phoenix', 'America/Denver'] (-110.26814562443687, 35.98507252905026), 'America/Phoenix', 'America/Denver'] (-133.73396065378114, 68.38068073677294), 'America/Inuvik', 'America/Yellowknife']

These might be errors in my algorithm, but I still though you might want to check this out.

This is my package: https://github.com/MrMinimal64/timezonefinder

jannikmi avatar Apr 13 '16 14:04 jannikmi

Thanks for the info. As far as I remember that area of Arizona is a mess due to the Navajo Nation and the Hopi. I will check it out though :)

cstich avatar Apr 22 '16 11:04 cstich

I checked the points manually. It looks like pytzwhere is not dealing with polygons within polygons correctly. Your results are indeed correct.

cstich avatar Apr 22 '16 12:04 cstich

I also think it is because of polygons within polygons. My algorithms is stopping when the point was found to be included in a polygon, too. I decided not to change my algorithm, because actually this is a mistake within the data and correcting it would make it ugly and slow.

jannikmi avatar Apr 22 '16 12:04 jannikmi

That makes sense. However, it is actually not a mistake in the data. It is legitimate for timezones to have timezones within them.

cstich avatar Apr 22 '16 12:04 cstich

Why? Then your data would suggest that this area is in two timezones... I don't think that makes sense.

jannikmi avatar Apr 22 '16 12:04 jannikmi

I should rephrase this for clarity. I didn't mean to say that a point would actually be in two polygons, but within the American/Denver timezone there is a part that is American/Phoenix. This video does a much better job at explaining it than I ever could.

cstich avatar Apr 22 '16 12:04 cstich

But still a timezone has to be unique and therefore the surrounding timezone would need to be excluded from the surrounded tz. Otherwise algorithms don't really have a chance of deciding which timezone it is (except hardcoding it) or am I wrong here?

jannikmi avatar Apr 22 '16 12:04 jannikmi

Polygons can't have holes in them. To ensure every coordinate was located within exactly one polygon would require changing the data to have a "seam" connecting the hole with the outer boundary. So it's just a matter of convention. We pull the data from http://efele.net/maps/tz/world/ and they follow one convention. The only solution here would be to reprocess the data to remove all layered polygons.

pegler avatar Apr 22 '16 13:04 pegler

The original data from Eric Muller does have holes though. I just checked.

cstich avatar Apr 22 '16 13:04 cstich

"Holes" as in the polygon defines both an outer and inner boundary?

pegler avatar Apr 22 '16 13:04 pegler

For me the ideal solution would be to convince them to correct the data (for me layered polygons are mistakes), but I don't know realistic this is. By the way do you have a way of automatically creating your .csv from efele.net data? I thought about that issue , but I just decided to leave it that way for now. There are just a few points where this matters and the effort for fixing this seemed to big.

jannikmi avatar Apr 22 '16 13:04 jannikmi

Yes a "hole" as in having an outer and inner boundary

cstich avatar Apr 22 '16 13:04 cstich

I remember running into issues with "holes" when initially working on this project. I can dig in next week to see if I can figure it out

pegler avatar Apr 22 '16 13:04 pegler

@MrMinimal64 I wrote a very short shell script that converts Eric Muller's shapefiles into a GEOjson.

@pegler Maybe it makes sense to use the multi-polygon timezone data? I used the polygon one so far but this discussion makes me wonder if we shouldn't use the other.

cstich avatar Apr 22 '16 13:04 cstich

I think I know how to fix this. I am working on it.

cstich avatar Apr 24 '16 03:04 cstich

This should now be fixed in the development branch. The development branch now correctly deals with holes. There are a few other little things I would like to add to that branch before we can release it but for testing timezonefinder it should already be sufficient

cstich avatar May 16 '16 15:05 cstich

@cstich @pegler hi guys, is there any new progress on this?

famanson avatar May 19 '17 09:05 famanson