data-science icon indicating copy to clipboard operation
data-science copied to clipboard

CoP: Data Science: Analyze correlations between metro locations and 311-data requests

Open ryanmswan opened this issue 5 years ago • 30 comments

Overview

Investigate whether there are meaningful trends associated with metro stops and metro lines with regards to requests tracked by 311-data in LA County.

Action Items

  • [x] Define requirements for 311 data (adding notes to the resources section and discussing at the Data Science CoP meeting
    • [x] Do you need a one-time or ongoing dump of the data?
    • [x] Do you need subset of data (i.e. certain years) or the entire data set (approx. 4 million rows or 11 GB)?
      • [x] If a subset is needed, please define subset characteristics (i.e. date range, etc.)
    • [x] Do you need online access via an API or a download of data?
    • [ ] Add dependency label and put in the icebox until 311 data is provided
  • [x] Find available data sources and add to Resources section below
  • [ ] Determine is this is one-time or ongoing project (and assign appropriate label)
  • [x] Write one-sheet
    • [ ] Define stakeholder
    • [ ] Summarize project including value add
    • [ ] Define project 6 month roadmap
    • [ ] Detail history (if any)
  • [x] Define tools to be used to visualize combined data
  • [ ] Create issues for the following
    • [x] EDA (Exploratory Data Analysis) of metro data
    • [ ] Identify correlations between distance from metro stop and request type
    • [ ] Determine if correlations observed are solely due to metro stop or are more broadly associated with population density or other factors
    • [ ] Combine geolocation data for metro lines with district types
    • [ ] Compare correlations/trends between different districts within each type
    • [ ] Compare LA county data with other California counties, compare with district types within county. (Post MVP)
    • [ ] Compare with statewide trends and within district types. (Post MVP)

Resources

Information about 311 Data here Access 311 data here http://geohub.lacity.org/datasets/metro-rail-lines-stops https://developer.metro.net/docs/gis-data/overview/ District types issue: https://github.com/hackforla/data-science/issues/118

use 2019 data for 311 streetlights crime metrostops

tools google colab, sklearn, pandas

Work in progress

ryanmswan avatar Apr 22 '20 02:04 ryanmswan

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in X days.

github-actions[bot] avatar Mar 18 '21 02:03 github-actions[bot]

@priyakalyan please document the following update to this issue in the comments here

Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

akhaleghi avatar Apr 08 '22 01:04 akhaleghi

Progress: I added this file Progress summary 311 Data Project and data dictionary for 311 data Value Column 311 data and Metro rail and bus line Value column Metro- Bus and Rail line on 3-31-2022. So far have downloaded the 311 data (not cleaned it yet) and looked at the request type count/relative frequency over the years: 2015-2022 (till March 27th). Also looked at the looked at the request type count for different APCs.

Since then have been out of town up until today, so literally have no further update for the last week.

Plan for the upcoming week:

  • Installing the docker in my windows PC- set up a local 311 data server;
  • look at the metro bus/rail line data; learn about geospatial data analysis-wiki

Availability: 6 hours this week.

ETA- Totally new to geospatial data analysis, so may be 1 to 2 weeks.

priyakalyan avatar Apr 08 '22 19:04 priyakalyan

Progress: Was successful in installing the docker but could not set up a local 311 data server (tried many times- the last step in Step 3: Build and seed your local database failed. Any suggestions/pointers? For now, I have stopped working on it. Downloaded data from this website.

Loaded the metro rail line shapefile, the metro bus line shapefile and the neighborhood council shapefile.

Currently working on spatially joining the 311 data and the NC data (looking at one region at a time- 12 in all). Then overlay the metro rail and bus line and plot different request type and do like a qualitative study exploring the request type count geographically.

Availability: 6 hours this week.

ETA- 1 to 2 weeks.

priyakalyan avatar Apr 29 '22 04:04 priyakalyan

Progress: Finally figured out to how to use paginated API's with python to fetch all rows of data from the 311 server for the year 2021. I have saved it as a CSV file-clean_311_data_2021. I will fetch the clean data rest of the years (2015-2020, 2022).

Have spatially joined the 311 data+ NC data + metro bus + metro rail line displaying the specific request types over 12 regions of NC.

Adding sample pics here- this is for the region 4- South East Valley- NC's: 'SHERMAN OAKS NC', 'NORTH HOLLYWOOD NORTH EAST NC', 'VAN NUYS NC', 'GREATER VALLEY GLEN', 'NOHO NC', 'NOHO WEST NC', 'STUDIO CITY NC', 'NC VALLEY VILLAGE', 'GREATER TOLUCA LAKE NC'.

Part1 Part2 Part3 Part4 Reg4

Availability: 6 hours this week.

priyakalyan avatar May 12 '22 18:05 priyakalyan

Progress:

  • Created heat map using folium for the request type- Single streetlight issue (SSI) and multiple street light issue over the reg 6 for the year 2021;
  • Could add all the years of data (2015-2021) for the reg 6 and req type: SSI as layers on the same map and toggle between the layer control to jump from one feature to the next one.
  • Successful in setting up geofencing (a block in radius) around each metro rail marker on reg 6. This extent of the geofencing can be changed depending upon the requirement.

Plan for the upcoming week:

  • Extract the bounds around each marker using geopandas manipulation- buffer and intersection...
  • Then analyze the number of request type within each of these buffer zones and compare them with the ones outside.

Availability: 6 hours this week.

ETA- 1 week

priyakalyan avatar May 19 '22 21:05 priyakalyan

The team discussed this last Thursday, so I'll leave some notes for the record:

I think it would be useful to have a histogram where the x axis is "distance from nearest bus stop/metro rail marker/etc." and the y axis is "number of requests". This will allow us to very clearly see whether there is some correlation between nearness to bus stops and 311 requests.

nichhk avatar May 23 '22 20:05 nichhk

Used the haversine formula- (great-circle distance) to calculate the distance between each request type-lat, long and metro rail stop. For each request type, found out the distance from the nearest metro rail marker. All this was done for reg6 - year 2021 and request type- Single Streetlight Issue.

As discussed in the last 311 team meeting, here is the histogram plot:

Histogram_reg6_ssi_2021_1

priyakalyan avatar May 27 '22 03:05 priyakalyan

Thanks Anupriya! Sorry for the delay. What do you make of this graph? To me, it seems to suggest that there is not a strong association between distance to nearest metro stop and request frequency--I'd expect to see a (basically) monotonically decreasing histogram, implying that there are a lot of requests close to metro stops but just a few far from metro stops. But maybe a request type like graffiti would be more illuminating.

Another bit that might help us understand this better: what is the density of metro stops? If the density of metro stops is very low, e.g., they are 10km apart from each other, then the median distance from the nearest metro stop of ~500m would be quite close. But if metro stops are 1km apart from each other, then ~500m is pretty far.

With this foundation, I think we can start controlling for factors like population density, bus ridership density, and metro stop density. Does that sound feasible?

nichhk avatar Jun 02 '22 05:06 nichhk

Have been trying to figure out how to get the population of each neighborhood council so that we can figure out the population density and so on. As @piotrsan mentioned in another issue

  • https://github.com/hackforla/311-data/issues/1229, here is the census data for the NC.

I also found this: Demographics of Neighborhood Councils. In both these files there are only 97 records- 97 NCs.

The NC boundary has been updated in 2018 with 2 new NC's added- here is the link. I found out the missing council names- NORTH WESTWOOD NC and ARTS DISTRICT LITTLE TOKYO NC.

Next step is to figure out how to go from census block/tract data and adjust it at NC level. This link gives the mapping process to start from block data and reconcile at NC boundary level.

After today's meeting- it looks like starting at census tract will be the easiest way to go. Take the NC shape file and merge it with the census tract and get the geocodes and move on to demographics from there.

priyakalyan avatar Jun 17 '22 03:06 priyakalyan

Have calculated the population of each neighborhood council using the census tract 2020 (TIGER/line shapefile 2020), updated NC shape file (99 councils) and the ACS 2020 demographics data at the tract level. No approximation was made in the geometry this time. Found the percentage of area/population for tracts intersecting multiple NCs and then calculated the actual population.

priyakalyan avatar Jul 14 '22 21:07 priyakalyan

Worked on this notebook- to find the updated population of the LA city neighborhood councils using geospatial analysis. Next- add a notebook- comparing the updated NC population obtained by geospatial analysis and arcGIS analysis.

priyakalyan avatar Jul 26 '22 02:07 priyakalyan

Have updated the notebook. The total population of LA city NCs is very close to the 2021 Census Bureau value. Have also been working on this PR- API pagination using python- to fetch all rows of data from 311 data pipeline for a given year.

priyakalyan avatar Aug 12 '22 01:08 priyakalyan

Hi @priyakalyan, are there any recent updates to this issue?

akhaleghi avatar Aug 31 '22 17:08 akhaleghi

  • I have been working on this PR- https://github.com/hackforla/311-data/pull/1257.

priyakalyan avatar Aug 31 '22 23:08 priyakalyan

A summary of this should be added to the wiki

ExperimentsInHonesty avatar Jun 18 '24 21:06 ExperimentsInHonesty

Thanks for all the prior work! I’ve reviewed the data pipeline, spatial joins, and exploratory work done by @priyakalyan and others. I’ll be building on this by focusing on a clean MVP that explores correlations between 311 request frequency and distance to metro stops, while accounting for population density.

Next up: defining the causal question more formally, recreating/validating the spatial joins, and exploring a few model-based comparisons. I’ll share a roadmap soon.

ssejal avatar Jul 09 '25 19:07 ssejal

MVP Progress Update + Charter + Framing Insight I've completed the initial MVP setup notebook and project charter for the 311 + Metro correlation analysis.

Notebook: metro_311_proect_setup

  • Clean join of 2024 311 requests with Neighborhood Councils and population data
  • Distance to nearest metro rail stop (in feet)
  • Walkability proximity flags (500 ft, 0.25 mile, 0.5 mile)
  • Visualizations:
    • Request counts per NC
    • Per capita request rates
    • Temporal heatmap by request type
    • Metro buffer overlays
    • Histogram of distance to metro

Project Charter: MVP Charter

Key Insight So Far Only 12.7% of 311 requests fall within 0.5 miles of a metro rail stop. While metro-only infrastructure impacts a limited geographic footprint, I’ll proceed with exploring localized spatial trends (e.g., request types or density near rail). These zones may still reveal meaningful differences, especially once normalized by population or compared to non-adjacent areas.

Optional Scope Expansion (Not in MVP Yet) I can layer the LA Metro bus stop shapefile later. Bus data may provide:

  • Broader geographic reach
  • Better representativeness across LA
  • A basis for comparing metro vs. bus-adjacent 311 activity

Would love feedback on whether the group prefers to stick with metro-only for MVP, or bring in bus stops soon after.

ssejal avatar Jul 12 '25 02:07 ssejal

Update on Metro–311 Analysis MVP I've completed the initial analysis exploring the relationship between 311 request rates and proximity to metro stops.

Highlights:

  • 311 requests were normalized per capita using NC-level population data.
  • Distance bands were created from each metro stop: <500ft, 500-1320ft, 1320-2640ft, >2640ft.
  • Visualization shows that while raw volume is low near metro stops (since few NCs are directly served), per capita rates vary meaningfully.
  • Some request types (e.g. graffiti, bulky items) show distinct spatial patterns based on proximity to metro stops.

Next steps:

  • Replicate this analysis using bus stops as a negative control or comparative lens. This will help test whether metro-specific effects persist or simply reflect higher urban density.

Notebook is ready for review: metro_311_project_setup.ipynb A follow-up notebook will focus on bus proximity analysis.

ssejal avatar Jul 17 '25 03:07 ssejal

@ssejal Please do a PR for the work you have done so far. If you need guidance on that, please add a comment to this issue and come to the data science community of practice

Also, I think it would be worth creating a small presentation about your work, which you could then add additional work to as you progress (if you want to keep working on this).

ExperimentsInHonesty avatar Aug 11 '25 20:08 ExperimentsInHonesty

contributing file from the hackforla/website repo https://github.com/hackforla/website/blob/gh-pages/CONTRIBUTING.md

ExperimentsInHonesty avatar Aug 12 '25 02:08 ExperimentsInHonesty

@ssejal Please provide update

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

You can use this template

1. Progress: 
2. Blockers: 
3. Availability:
4. ETA:
5. Pictures (if necessary): 

ExperimentsInHonesty avatar Aug 12 '25 02:08 ExperimentsInHonesty

example of presentation about personal work

hziegel on Jun 20 Member Update: I have created a PowerPoint presentation here: https://docs.google.com/presentation/d/1R-2u7vkqvcVslsIG1bF5ygjusDECHBdR

ExperimentsInHonesty avatar Aug 12 '25 02:08 ExperimentsInHonesty

@ssejal Please provide update

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

You can use this template

1. Progress: 
2. Blockers: 
3. Availability:
4. ETA:
5. Pictures (if necessary): 

ExperimentsInHonesty avatar Aug 18 '25 21:08 ExperimentsInHonesty

  1. Progress:
  • Completed a clean MVP notebook (metro_311_project_setup.ipynb) analyzing 2024 LA 311 requests in relation to metro stops.

  • Joined 311 data (1.44M records) with updated Neighborhood Council shapefiles and population data.

  • Computed distance to nearest metro rail stop and created walkability buffers (500 ft, 0.25 mi, 0.5 mi).

  • Normalized request counts per capita at the NC level.

  • Visualizations include:

    • Temporal heatmaps of request types
    • Spatial overlays with metro buffers
    • Histograms of request frequency by distance bands
  • Early insight: ~12.7% of 311 requests fall within 0.5 miles of a metro stop. While coverage is limited, per capita rates show variation in certain request types (graffiti, bulky items) near metro stations.

  1. Blockers
  • Still validating whether observed differences are metro-specific or simply proxies for higher-density areas.
  • Need to incorporate bus stop data as a comparative control before moving forward with causal framing.
  1. Availability:
  • 2 hours this week.
  1. ETA: PR with the current notebook will be opened this week. Bus stop comparison analysis notebook expected within ~2 weeks.

  2. Pictures:

Image Image

ssejal avatar Aug 19 '25 00:08 ssejal

@ssejal Please provide update

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

You can use this template

1. Progress: 
2. Blockers: 
3. Availability:
4. ETA:
5. Pictures (if necessary): 

ExperimentsInHonesty avatar Sep 08 '25 23:09 ExperimentsInHonesty

1. Progress:

2. Blockers:

  • Need to clarify the most useful framing question (descriptive vs causal) to ensure findings are actionable

3. Availability:

  • Continuing work part-time this week
  • Will attend the next Data Science CoP meeting for discussion

4. ETA:

  • Create a PR for the project: before next CoP meeting
  • Direct comparison of bus vs. rail results + draft causal framing: 2 weeks
  • Multi-year extension (2020–2024): by end of October

5. Pictures: Image Image

ssejal avatar Sep 10 '25 23:09 ssejal

ChatGPT conversation in answer to the question "how might information about correlations between 311 requests and metro or bus stops help Los Angeles Neighborhood Councils"

https://chatgpt.com/share/68c8cde5-683c-8008-84b4-7aeb660dc0a5

ExperimentsInHonesty avatar Sep 16 '25 02:09 ExperimentsInHonesty

Please provide update @ssejal

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

You can use this template

1. Progress: 
2. Blockers: 
3. Availability:
4. ETA:
5. Pictures (if necessary): 

chinaexpert1 avatar Oct 14 '25 01:10 chinaexpert1

Please provide update @ssejal

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

You can use this template

1. Progress: 
2. Blockers: 
3. Availability:
4. ETA:
5. Pictures (if necessary): 

chinaexpert1 avatar Oct 21 '25 03:10 chinaexpert1