dodgr icon indicating copy to clipboard operation
dodgr copied to clipboard

Identifying strong components

Open harrysroberts opened this issue 3 months ago • 5 comments

dodgr_components() identifies the semi-connected components of a directed graph, i.e. the disjoint subgraphs with a path existing from i to j OR from j to i for each i-j pair.

The main purpose for this in real terms that I can see is to filter out isolated disconnected nodes and links, such that origin and destination node sets do not snap to them and become unroutable.

However, if there is only a one-way street linking an isolated section to the rest of the network, then dodgr_components() would return them as a single component despite one direction being unroutable.

Thus, it may be helpful to have a "strong" setting within dodgr_components() that finds the maximal disjoint subgraphs with a path existing from i to j AND from j to i for each i-j pair.

harrysroberts avatar Oct 10 '25 15:10 harrysroberts

@HarryRobertsITS Interesting proposition. That's fairly easy to implement as a separate function to update "standard" component vectors:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.4.3.14'
bb <- c (16.3, 46.28, 16.5, 46.38)
 net <- dodgr_streetnet_sc (bb, expand = 0, quiet = FALSE)
graph <- weight_streetnet (net, wt_profile = "motorcar", turn_penalty = FALSE)
nrow (graph)
#> [1] 31872
length (table (graph$component))
#> [1] 10

That is then the number of components, including "weak" connections in single direction only. This is then a demo function to update component vectors to only include "strong" = bi-directional connections between components (using the internal function dodgr_graph_cols):

strong_components <- function (graph) {
    gr_cols <- dodgr:::dodgr_graph_cols (graph)
    g_ab <- paste0 (graph [[gr_cols$from]], "_", graph [[gr_cols$to]])
    g_ba <- paste0 (graph [[gr_cols$to]], "_", graph [[gr_cols$from]])
    g_dirs <- c (g_ab, g_ba)
    index <- which (duplicated (g_dirs))
    index_dual <- which (g_ab %in% g_dirs [index])
    graph_dual <- graph [index_dual, ]
    nrow (graph_dual)
    graph_dual$component <- NULL
    graph_dual <- dodgr_components (graph_dual)
}
graph <- strong_components (graph)
length (table (graph$component))
#> [1] 66

Created on 2025-10-13 with reprex v2.1.1

And as expected, the components are then considerably more disconnected, into 66 instead of 10 distinct components. I think if something like this were to be implemented here, it would be best done through modifying the dodgr_components() function, while leaving default component calculation done, for example, as part of weight_streetnet(), in current form. The dodgr_component() function would then have an additional bidirectional = FALSE parameter which, if set to TRUE, would calculate components like above, using only bi-directional connections.

The one tricky aspect there is that you'd still have single directional edges which connected two components. Component numbers are assigned to edges, and so those edges would need to have NA for component numbers, even through both vertices at either end were actually parts of identifiable components.

What do you think?

mpadge avatar Oct 13 '25 10:10 mpadge

@mpadge Thanks for your response and your work on this excellent package.

I think that having the bidirectional parameter within dodgr_components() is a great way of implementing this. I don't think that the assignment of NA to edges that cross between strong components is a problem - they are not part of any components therefore their component attribute is 'not applicable' in the true sense.

I have tried implementing the strong_components() function you propose on a real network, namely the Ordnance Survey Multi-modal Routing Network for the sample area of Exeter available from https://osdatahub.os.uk/data/downloads/sample/OsMultiModalRoutingNetwork. I don't think it is getting it quite right - it seems to be identifying too many separate components. The code I have used to test it is as follows

library(dodgr)
library(sf)
library(tidyverse)

strong_components <- function (graph) {
  gr_cols <- dodgr:::dodgr_graph_cols (graph)
  g_ab <- paste0 (graph [[gr_cols$from]], "_", graph [[gr_cols$to]])
  g_ba <- paste0 (graph [[gr_cols$to]], "_", graph [[gr_cols$from]])
  g_dirs <- c (g_ab, g_ba)
  index <- which (duplicated (g_dirs))
  index_dual <- which (g_ab %in% g_dirs [index])
  graph_dual <- graph [index_dual, ]
  nrow (graph_dual)
  graph_dual$component <- NULL
  graph_dual <- dodgr_components (graph_dual)
}

street_network_links <- read_sf("input/OSMulti-modalRoutingNetwork.gpkg",layer="mrn_ntwk_transportlink")

bidirectional_network <- street_network_links %>%
  st_set_crs(4326) %>%
  mutate(
    from_node = os_startnode,
    to_node = os_endnode,
    distance = as.numeric(st_length(geometry)),
    car_time = distance/`avgspeed:forward` * 60/1000 
  ) %>%
  select(from_node:car_time) %>%
  bind_rows( #add the reverse edges
    street_network_links %>%
      st_set_crs(4326) %>%
      mutate(geometry = st_reverse(geometry)) %>%
      mutate(
        from_node = os_endnode,
        to_node = os_startnode,
        distance = as.numeric(st_length(geometry)),
        car_time = distance/`avgspeed:backward` * 60/1000 
      ) %>%
      select(from_node:car_time)
  ) %>%
  filter(!is.na(car_time)) %>% #remove non-car edges
  group_by(from_node, to_node) %>%
  slice_min(car_time, with_ties = FALSE) %>% #replace duplicate edges with minimum
  ungroup()

strong_network <- bidirectional_network %>%
  st_drop_geometry() %>%
  strong_components() %>%
  filter(component==1) %>%
  select(!component) %>%
  left_join(select(bidirectional_network,from_node,to_node))

write_sf(strong_network,"output/exeter_car_network.gpkg")

Opening this network in QGIS (in blue) on top of the original multi-modal routing network (in red) - you can see that there are large areas which are clearly strongly connected to the rest of the network but have not been included in component 1.

Image

I'm no expert in this area but from a Wikipedia dive it seems like Tarjan's algorithm may be the fastest solution for this https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

harrysroberts avatar Oct 14 '25 14:10 harrysroberts

@HarryRobertsITS Thanks for the example code. I won't have time to dive in to this for a while, but a superficial scan of your code suggests to me that the network as you're constructing it there will not contain the appropriately duplicated edges, and therefore won't connect as it should, as shown in your plot. If i were doing it, i'd translate your downloaded file in either gpkg or pbf format to .osm - for which I use https://osmcode.org/osmium-tool/, but GDAL can also do that through sf if the necessary drivers have been installed. Then:

  • Use https://docs.ropensci.org/osmdata/reference/osmdata_sc.html to read data, and
  • Pass to dodgr::weight_streetnet().

I suspect that would give you a network with considerably more edges than the method you've used, because dodgr includes code to ensure all edges are appropriately duplicated unless explicitly tagged as one-way or equivalent.

As for the algorithm: That looks interesting, and relatively straightforward to implement, but I think for a first cut, I'll likely just leave in the form suggested here, and only maybe later develop a more efficient algorithm like that in C++.

mpadge avatar Oct 14 '25 15:10 mpadge

@mpadge I prefer to work with adjacency lists stored as data tables when I can - these are compatible with dodgr when I use 'from' and 'to' in the node id column names. I believe my method of duplicating edges is correct, I simply bind the table to a second table with the directionality reversed, but I would welcome identification of any specific issues (I have omitted the addition of turn restrictions for simplicity).

I think that the strong_components() function as defined above computes something like the connected components with one-way streets excluded, which is why the filtered network in the image above is so sparse. Strong components can include one-way streets, as long as there is also a route to return. I don't think that strong components can be obtained through applying dodgr_components() to some manipulated version of the base graph in this way - it would require a bespoke approach.

Running your original code on the example of Varaždin, the components identified are shown in the image below. You would expect that the largest strong component would span almost the entire network as road networks are designed such that all parts of the network are reachable from any point, with some isolated islands around the bounding box. I think the presence of many roundabouts, which are one way, is resulting in the computation of a large number of isolated components.

Image

harrysroberts avatar Oct 14 '25 17:10 harrysroberts

@HarryRobertsITS Yeah, you're right there - the hacky code I put in just artificially disconnects components, when in reality many should remain connected through alternative ways beyond the one way connections. Then doing this properly would need implementation of a full algorithm like your suggestion above.

I'll most be away from my computer for next couple of weeks, so won't have time until Nov at the earliest to tackle this. In the meantime, if you've got C++ skills, implementation shouldn't be that difficult, and would be done as an alternative version of current components algorithm https://github.com/UrbanAnalyst/dodgr/blob/67754636c3f5acdfeac6bc3a5d39cc6a56d8ecb8/src/graph.cpp#L120-L178

mpadge avatar Oct 15 '25 14:10 mpadge