Google-Location-History-Total-Distance-Travelled icon indicating copy to clipboard operation
Google-Location-History-Total-Distance-Travelled copied to clipboard

Distances are off by 20% ?

Open tomwaitforitmy opened this issue 8 years ago • 6 comments

Hey kyleai,

I am trying to write a python tool which is calculating distances for road trips. My own distances are wrong by roughly 20%. I geocode the latitudes/longitudes to readable addresses and compare my computed distances to whatever Google Maps gives me for the distance between the addresses. I am actually not sure if it is possible to compute the value with higher accuracy, but the errors are quite huge.

While I was looking for a solution I found your project and figured that you compute the distance almost the same like I do. The only difference I found was that you take the abs(value), however, this changed nothing in my results. Since your tool is not computing individual trips and only summarizing a total value I can't compare your results to my data. Did you validate your distances? Could you double-check an individual trip?

Here you can find my code for comparison. I also found another blog reporting similar errors with almost the same computation.

For some trips I have like 10-20 signals over 1-2 hours, however, only the start and end point come with actual value for latitude/longitude and obviously the haversine distance is quite wrong for a road trip when you just can use start and end point for the computation. Do you know more about how the data in the .json is actually organized?

Any feedback is appreciated!

Cheers, Tommy

tomwaitforitmy avatar Feb 25 '17 12:02 tomwaitforitmy

Hey Tom,

I used this site to test out my results. It uses the haversine formula and is interactive and easy to test on.

When testing out the straightest road I could find and comparing it to the code we both used (available on that site), I got 191.1 miles for Google Maps and 191.5 miles with the Haversine formula. That seems close enough to me, especially considering there are a few turns in the road and google maps also accounts for differences in altitude. Keep in mind that the formula given and the one in your code is in km, not miles, and must be converted to compare it to the distance google gives. On the other blog you linked to it seems like he said the results were pretty reasonable. I'm not sure where the similar errors were.

As for the amount of data points in the array, my data seems to have a lat/long pair for each timestamp. After looking through a lot of it, there aren't any that don't have one. Most of them don't have activities, but all of them that I've checked have a geocode. I'm not sure why yours would only have the start and endpoint lat/long.

I hope this answers your questions.

Thanks, Kyle

kinto0 avatar Feb 25 '17 15:02 kinto0

Hey Kyle,

thanks for the quick answer! Awesome!

I can see that you would achieve that accuracy with the straightest road you can find ;). However, I am interested in tracking my actual car-trips and consequently I hoped there would be enough signals to approximate the road, but at least in my data that is not the case. Here you can find a road in Bógota I traveled last year. While Google maps reports 16-18km (depending on the route) the haversine distance yields 10 km. That error is just too much. It makes sense, since the straight line is much shorter. I would assume that your tool computes the same distance like mine. I will test that with a small data set where I just upload a single trip. That would work, wouldn't it?

Here is a part of my location history where I have lots of timestamps without any lat/lon data. That does not occur in your data? Maybe its my phone... How frequent are your signals? Mine are pretty frequent, but whenever I traveled "inVehicle" there are often huge periods (~1-2 hours) without lat/lon values.

Cheers, Tommy

PS: I am from Germany, so I compute everything in kilometers and Google Maps also gives me that ;)

tomwaitforitmy avatar Feb 25 '17 17:02 tomwaitforitmy

Hey Tom,

Sorry about the misunderstanding! My data computation is almost identical to yours. Up until now, I assumed that the data points in the JSON were within minutes of eachother because the initial ones I tested were. After looking closer, I can see that you are right and they are much farther apart than I had thought. With something that needs to be accurate like your road trip program, that is not good.

When I look on this website on a road I have only been on once, the data points seem to be within minutes of eachother. Clicking on the time shows an interactive visual thing moving around. For you does that still show timestamps within hours of eachother? For me it gets exactly the roads I was on.

Sorry I can't help as much as I would like; I didn't even know this was a problem until today.

Thanks, Kyle

kinto0 avatar Feb 25 '17 18:02 kinto0

Hey Kyle,

no worries! I hope we can fix it together. I extracted a trip in Bógota from last year. It contains 60 time-stamps, but only 3 lat/lon signals. My conclusion is that for data like this I just can't reconstruct the distance properly ... Your tool gives me 0 miles as output for this data set. Could you tell me if the tool detects the data as invalid/inaccurate or is there a bug if you have just a small data set with 1 trip?

Edit http://theyhaveyour.info gives me the 3 locations, but obviously does not compute any distance.

Regards, Tommy

tomwaitforitmy avatar Feb 26 '17 11:02 tomwaitforitmy

That is very weird. There must be a bug when there is a hole in the data (a place where it doesn't give lat/long for a few values) because when testing the program I used a point with only one data point. What mine does is it takes the last lat/long pair and finds the distance between the current pair. If there is no last pair then it will throw an error.

It's very weird how your data has holes with no location in them. Did you lose internet for those parts of the trip?

kinto0 avatar Feb 26 '17 14:02 kinto0

Hey Kyle,

yeah I think that data from Colombia is unusable. Just ignore it. I extracted another more recent day from Germany where I traveled roughly 180km in the car (90km each direction). My tool computes a total distance of ~69km which is obviously trash. Your tool computes ~63miles (101km) which is still not good, but slightly better. The data visualized on http://theyhaveyour.info/ doesn't look that terrible. The route I took is pretty much sampled like I drove. I am not sure why my tools yields a different value then yours. I assume because I inserted a threshold for the 'inVehicle' confidence. I will let keep you up to date.

Cheers, Tommy

tomwaitforitmy avatar Feb 26 '17 16:02 tomwaitforitmy