osmnx
osmnx copied to clipboard
Fill missing values with most common value on similar roads
Contributing guidelines
- [X] I understand the contributing guidelines
Documentation
- [X] My proposal is not addressed by the documentation or examples
Existing issues
- [X] Nothing similar appears in an existing issue
What problem does your feature proposal solve?
Currently there is no easy or convenient way to fill missing values in node or edge attributes.
What is your proposed solution?
Include a function that can fill missing values by using the most occurring values for similar roads.
So for example, I say:
ox.fill_missing_values(graph.edges, values_to_fill="maxspeed", base_on=["highway", "lanes"])
In this case it will fill the maxspeed based on the most occurring maxspeed of each highway type with a certain amount of lanes.
If no values are available for any combination of highways and lanes, it starts dropping base_on
values right to left until there are.
So if the maxspeed is N/A for a primary_link with 3 lanes, but there are no other primary_links with 3 lanes in the graph, it checks if there are any primary_lanes with a maxspeed, and if not just checks any roads with maxspeed.
Furthermore:
- if
values_to_fill
isNone
, if fills all values that are not inbase_on
- if
base_on
isNone
, is considers all roads (so fills everything with a single value).
What alternatives have you considered?
Doing it manually.
Additional context
Some very ugly code I wrote for this:
# Function to standardize maxspeed values
def standardize_maxspeed(value):
# Check if value is a list, and take the first element if so
if isinstance(value, list):
value = value[0]
# Convert to string and extract numeric part
if isinstance(value, str):
value = ''.join(filter(str.isdigit, value))
# Convert to numeric
try:
return float(value)
except (ValueError, TypeError):
return np.nan
# Apply standardization to maxspeed
edges['maxspeed'] = edges['maxspeed'].apply(standardize_maxspeed)
# Standardize 'highway' values - use the first element if it's a list
edges['highway'] = edges['highway'].apply(lambda x: x[0] if isinstance(x, list) else x)
# Group by 'highway' and find the most common 'maxspeed'
most_common_speeds = edges.groupby('highway')['maxspeed'].agg(lambda x: pd.Series.mode(x.dropna())[0] if not x.dropna().empty else np.nan)
# Convert to dictionary
highway_maxspeed_dict = most_common_speeds.to_dict()
highway_maxspeed_dict