FlowKit icon indicating copy to clipboard operation
FlowKit copied to clipboard

Ignore the diagonals and unknowns in inflows/outflows

Open Thingus opened this issue 3 years ago • 2 comments

As referred to in #2029, it would be good to exclude diagonals (people who have not moved) and unknowns/lost subs from inflows/outflwos

We could extend the inflows/outflows classes and the relevent schema endpoints ect to introduce this as an option

Thingus avatar May 05 '22 17:05 Thingus

I definitely think it would be good to exclude diagonals from the inflows/outflows, or at least allow this as an option (I think I'd favour "excluding diagonals" as the default). We could consider adding a third grouping, 'stayed', with just the diagonals, but the diagonals are already accessible directly from the full OD matrix so perhaps that's unnecessary.

Regarding "unknowns": I'm now starting to think this shouldn't be a responsibility of the inflows/outflows, and should instead be controlled through the location sub-queries and flows join type (e.g. the include_unlocatable argument to MajorityLocation would determine whether unlocatable subscribers end up being counted in the inflows/outflows, and the join_type argument to Flows would do the same for appeared/disappeared subscribers). The downside of this is that, if a user wants to get inflows including unlocatable subscribers (from a MajorityLocation query) and also inflows excluding unlocatable subscribers, this would require running two separate MajorityLocation queries. We may want to look at breaking down MajorityLocation a little to improve cache re-use in this situation.

jc-harrison avatar May 09 '22 17:05 jc-harrison

Further thoughts on this:

  • I think it does make sense to have a parameter in flowmachine inflows/outflows queries to control whether or not diagonals are included, and I'd prefer to change the default to "excluded".
  • For the API-exposed queries, I think perhaps there is a case for adding a "stayed" query kind - this is the same as the diagonal of the full OD matrix, but there may be situations where a user would require access to the "stayed" counts (probably along with inflows and outflows) but would not require the full OD matrix, so having a separate query kind allows separation of those permission scopes
  • I'm not sure it would make sense to expose the "include/exclude diagonals" parameter through the API - doing so would effectively allow users to get the unredacted "stayed" counts, by running two inflows queries (one with diagonals, the other without) and subtracting one from the other. So I'd prefer to have the exposed inflows/outflows queries always exclude the diagonal, along with a "stayed" query kind to separately extract the (redacted) diagonals.

jc-harrison avatar Jan 18 '23 17:01 jc-harrison