osmnx icon indicating copy to clipboard operation
osmnx copied to clipboard

Fill missing values with most common value on similar roads

Open EwoutH opened this issue 5 months ago • 2 comments

Contributing guidelines

  • [X] I understand the contributing guidelines

Documentation

  • [X] My proposal is not addressed by the documentation or examples

Existing issues

  • [X] Nothing similar appears in an existing issue

What problem does your feature proposal solve?

Currently there is no easy or convenient way to fill missing values in node or edge attributes.

What is your proposed solution?

Include a function that can fill missing values by using the most occurring values for similar roads.

So for example, I say:

ox.fill_missing_values(graph.edges, values_to_fill="maxspeed", base_on=["highway", "lanes"])

In this case it will fill the maxspeed based on the most occurring maxspeed of each highway type with a certain amount of lanes.

If no values are available for any combination of highways and lanes, it starts dropping base_on values right to left until there are. So if the maxspeed is N/A for a primary_link with 3 lanes, but there are no other primary_links with 3 lanes in the graph, it checks if there are any primary_lanes with a maxspeed, and if not just checks any roads with maxspeed.

Furthermore:

  • if values_to_fill is None, if fills all values that are not in base_on
  • if base_on is None, is considers all roads (so fills everything with a single value).

What alternatives have you considered?

Doing it manually.

Additional context

Some very ugly code I wrote for this:

# Function to standardize maxspeed values
def standardize_maxspeed(value):
    # Check if value is a list, and take the first element if so
    if isinstance(value, list):
        value = value[0]
    # Convert to string and extract numeric part
    if isinstance(value, str):
        value = ''.join(filter(str.isdigit, value))
    # Convert to numeric
    try:
        return float(value)
    except (ValueError, TypeError):
        return np.nan
    
# Apply standardization to maxspeed
edges['maxspeed'] = edges['maxspeed'].apply(standardize_maxspeed)

# Standardize 'highway' values - use the first element if it's a list
edges['highway'] = edges['highway'].apply(lambda x: x[0] if isinstance(x, list) else x)

# Group by 'highway' and find the most common 'maxspeed'
most_common_speeds = edges.groupby('highway')['maxspeed'].agg(lambda x: pd.Series.mode(x.dropna())[0] if not x.dropna().empty else np.nan)

# Convert to dictionary
highway_maxspeed_dict = most_common_speeds.to_dict()
highway_maxspeed_dict

EwoutH avatar Jan 29 '24 15:01 EwoutH