FlowKit icon indicating copy to clipboard operation
FlowKit copied to clipboard

Need a locatable unique subscribers set

Open greenape opened this issue 4 years ago • 2 comments

Unique subscribers is very useful for creating subsets of subscribers who are active, but would be much more useful if those subscribers were all locatable.

greenape avatar Aug 16 '21 22:08 greenape

Perhaps more generally, it would be useful to have a way to run queries on only the set of locatable events (e.g. IntereventInterval and TotalActivePeriodsSubscriber, as well as UniqueSubscribers). The main use-case I'm imagining here is to use these not-location-related queries to define subsets of subscribers that will sufficiently often appear in the outpus of SubscriberLocations queries.

I think there are also two slightly different definitions of "locatable" that we may want to separate here:

  • Exclude events whose cell ID does not appear in infrastructure.cells (this is currently implicitly the case for the outputs of a JoinToLocation query, due to the INNER JOIN of query to spatial unit)
  • Exclude events for cells that do appear in infrastructure.cells, but don't map to any location for a specified spatial unit - e.g. events at a cell whose location falls outside the admin0 boundary. These events are currently not excluded from the output of a JoinToLocation query (and hence also the output of a SubscriberLocations query) if a mapping table is specified for the spatial unit, because the spatial unit join clause uses a LEFT JOIN, but are excluded for a polygon spatial unit without mapping table, because the point-in-polygon join clause uses an INNER JOIN. I think we need to make this consistent one way or another, and perhaps also user-controllable via a parameter.

One final note (which I feel may be best handled as part of this issue rather than its own issue): I think the ignore_nulls argument to SubscriberLocations has no effect at all - it filters the output of the location-joined events query to exclude any rows with a null location_id, but these are already excluded by the INNER JOIN in JoinToLocation (as mentioned in my first point above).

jc-harrison avatar Jan 27 '22 13:01 jc-harrison

I think there are also two slightly different definitions of "locatable" that we may want to separate here

The situation is simplified as of #5361: "locatable" in the sense of "cell ID appears in the result of a specified SpatialUnit query" now unambiguously means "cell ID corresponds to one of the relevant geography elements" (depending on the type of spatial unit, this could mean that the cell ID appears in infrastructure.cells, appears in the specified mapping table, or has a known point location that falls within one of the specified polygons). So the set of locatable cells can be entirely specified by a SpatialUnit object, and what remains for this issue is to allow more queries to be run only on the events at these locatable cells.

jc-harrison avatar Sep 05 '22 15:09 jc-harrison