Accuracy Assessment
Background
Accuracy will most likely be important to the success of this tool, internally and externally. We don't want a tool that's wildly inaccurate, either in terms of the recommended location to meet, or in the amount of GHG that are emitted to get there.
However, the degree of acceptable inaccuracy is not known. How far from the perfect meeting location is acceptable? What percentage off from the true emissions are acceptable? What is our baseline for emissions accuracy?
To at least ensure that the tool is in the same "ballpark", we can compare our results with other publicly-accessible flight emissions estimates.
Caveats
- Our tool is assuming direct flights (probably the largest source of inaccuracy) because we don't have access (at the moment) to actual flight routes
- We don't have access to the actual type of aircraft that is being flown, so we're making some generalizations.
Test
I'll test with 10 random DS member's locations, all flying one way to IAD (Washington, D.C.). I'll compare with Google Flights and the results from @nerik's Observable Notebook.
I'll search a 1-week window and will try to include the connection with the least amount of transfers (direct if possible), and will choose the option with the lowest amount of CO₂e from the options of the least amount of connections. The results from the tool were halved to represent 1 way flights. (is this correct that the tool currently shows round-trip impacts? @nerik
- = no direct flight found in Observable
/ = airport not listed in Observable
| Team Member | Home Airport | Co₂ordinate (kg CO₂) | Google Flights (kg CO₂e) | Observable (kg CO₂e) | Co₂ordinate vs Google (100% means equal) |
|---|---|---|---|---|---|
| Bob | GVA | 990 | 530 | 645 | 187% |
| Jane | CDG (RNS didn't appear in Observable) | 890 | 319 | 481 | 279% |
| Muhammed | ICN | 1670 | 965 | 1066 | 173% |
| Doug | YVR (YLW didn't appear in Observable) | 500 | 209 | 228 | 239% |
| Katrina | EDI | 840 | 504 | 489 | 167% |
| Paul | LIM | 870 | 387 | - | 225% |
| Sandra | SMF | 570 | 275 | - | 207% |
| Chen | BEY | 1405 | 552 | - | 255% |
| Pablo | TUS | 465 | 196 | / | 237% |
| Phil | GOI | 1990 | 795 | / | 250% |
| TOTAL | 10190 | 4732 | 215% |
Next steps
Co₂ordinate estimates are 1.6-2.8x that of Googles, averaging 2.15x. It'd be worth looking into:
- Why?
- If the range of discrepancy (1.6-2.8x in this test) impact the recommended meeting location (since some team member's estimated impacts are more different than others, compared to the results from Google Flights).
- If 1 and 2 are acceptable to users.
cc @kamicut @nerik @wildintellect @LanesGood
This feels like a Data Science and Quality Assurance task. Perhaps bring in @kathrynberger and the Data Team to work on the data sourcing, algorithm, and coming up with tests to verify our results.
@wrynearson Hey thanks for that, super insightful 👍
is this correct that the tool currently shows round-trip impacts
Yes I actually fixed that slight omission back in January 🙄
- I am a bit surprised by the gap between the Observable notebook and Google Flights. One should assume that Google Flights uses Google's Travel Impact model API. Have you picked the lowest or highest value amongst flights for the same route? Might also be a function of the time of the year.
- Personally, I'm not bothered too much by
Meet & Greta estimates are 1.6-2.8x that of Googlesfor the many reasons we've discussed on Slack and elsewhere. However, The range of discrepancy (1.6-2.8x in this test)is more preoccupying IMHO.
Thanks @nerik. I chose the lowest emission value from the least amount of connections. E.g., if there are 5 options with 1 connection, I picked the least emissions of the five. I just picked a random day, so is it possible that there were lower emission flights that Observable is picking up on a flight I didn't see? That sounds unlikely.
Agreed that if we're relatively consistently more than Google, that's OK – we can defend that Google's model is less holistic. But we'll need to test the range of discrepancy.
@kathrynberger flagged interest in this project, including around more in-depth accuracy assessments. I don't think that makes sense to do this quarter given we only have 1 sprint of labs work for this, but we can keep that offer for another time. As long as we're not recommending the wrong location by a wide margin, I think we're OK