co2ordinate icon indicating copy to clipboard operation
co2ordinate copied to clipboard

Accuracy Assessment

Open wrynearson opened this issue 2 years ago • 3 comments

Background

Accuracy will most likely be important to the success of this tool, internally and externally. We don't want a tool that's wildly inaccurate, either in terms of the recommended location to meet, or in the amount of GHG that are emitted to get there.

However, the degree of acceptable inaccuracy is not known. How far from the perfect meeting location is acceptable? What percentage off from the true emissions are acceptable? What is our baseline for emissions accuracy?

To at least ensure that the tool is in the same "ballpark", we can compare our results with other publicly-accessible flight emissions estimates.

Caveats

  1. Our tool is assuming direct flights (probably the largest source of inaccuracy) because we don't have access (at the moment) to actual flight routes
  2. We don't have access to the actual type of aircraft that is being flown, so we're making some generalizations.

Test

I'll test with 10 random DS member's locations, all flying one way to IAD (Washington, D.C.). I'll compare with Google Flights and the results from @nerik's Observable Notebook.

I'll search a 1-week window and will try to include the connection with the least amount of transfers (direct if possible), and will choose the option with the lowest amount of CO₂e from the options of the least amount of connections. The results from the tool were halved to represent 1 way flights. (is this correct that the tool currently shows round-trip impacts? @nerik

- = no direct flight found in Observable / = airport not listed in Observable

Team Member Home Airport Co₂ordinate (kg CO₂) Google Flights (kg CO₂e) Observable (kg CO₂e) Co₂ordinate vs Google (100% means equal)
Bob GVA 990 530 645 187%
Jane CDG (RNS didn't appear in Observable) 890 319 481 279%
Muhammed ICN 1670 965 1066 173%
Doug YVR (YLW didn't appear in Observable) 500 209 228 239%
Katrina EDI 840 504 489 167%
Paul LIM 870 387 - 225%
Sandra SMF 570 275 - 207%
Chen BEY 1405 552 - 255%
Pablo TUS 465 196 / 237%
Phil GOI 1990 795 / 250%
TOTAL 10190 4732    215%

Next steps

Co₂ordinate estimates are 1.6-2.8x that of Googles, averaging 2.15x. It'd be worth looking into:

  1. Why?
  2. If the range of discrepancy (1.6-2.8x in this test) impact the recommended meeting location (since some team member's estimated impacts are more different than others, compared to the results from Google Flights).
  3. If 1 and 2 are acceptable to users.

cc @kamicut @nerik @wildintellect @LanesGood

wrynearson avatar Apr 14 '23 07:04 wrynearson

This feels like a Data Science and Quality Assurance task. Perhaps bring in @kathrynberger and the Data Team to work on the data sourcing, algorithm, and coming up with tests to verify our results.

wildintellect avatar Apr 14 '23 15:04 wildintellect

@wrynearson Hey thanks for that, super insightful 👍

is this correct that the tool currently shows round-trip impacts

Yes I actually fixed that slight omission back in January 🙄

  • I am a bit surprised by the gap between the Observable notebook and Google Flights. One should assume that Google Flights uses Google's Travel Impact model API. Have you picked the lowest or highest value amongst flights for the same route? Might also be a function of the time of the year.
  • Personally, I'm not bothered too much by Meet & Greta estimates are 1.6-2.8x that of Googles for the many reasons we've discussed on Slack and elsewhere. However,
  • The range of discrepancy (1.6-2.8x in this test) is more preoccupying IMHO.

nerik avatar Apr 18 '23 14:04 nerik

Thanks @nerik. I chose the lowest emission value from the least amount of connections. E.g., if there are 5 options with 1 connection, I picked the least emissions of the five. I just picked a random day, so is it possible that there were lower emission flights that Observable is picking up on a flight I didn't see? That sounds unlikely.

Agreed that if we're relatively consistently more than Google, that's OK – we can defend that Google's model is less holistic. But we'll need to test the range of discrepancy.

@kathrynberger flagged interest in this project, including around more in-depth accuracy assessments. I don't think that makes sense to do this quarter given we only have 1 sprint of labs work for this, but we can keep that offer for another time. As long as we're not recommending the wrong location by a wide margin, I think we're OK

wrynearson avatar Apr 18 '23 15:04 wrynearson