311-data
311-data copied to clipboard
Adjust data cleaning script to prune data points outside of LA Neighborhood Districts
Dependency
- [ ] obtain 2024 or most recent Boundaries JSON file
Overview
We need to remove 311 Data service requests on our map that do not fall within the boundaries of a Neighborhood Council since they are inaccessible and mostly confuse the users
Action Items
- [ ] write python script that takes as input a CSV with a subset of 311 dataset (see example dataset below) that identifies 311 requests that do not fall within a Neighborhood Council boundary (provide proof of concept)
- [ ] report on this ticket how many 2024 service requests fall outside NC boundaries
- [ ] incorporate the functionality into our daily hugging face cron job @ updateHfDataset.py
Resources/Instructions
Screenshot of requests outside NC boundaries
Useful Links
- DBeaver: https://dbeaver.io/
- DBeaver + DuckDb: https://duckdb.org/docs/guides/sql_editors/dbeaver.html
- Example dataset (2022 311 data): https://data.lacity.org/City-Infrastructure-Service-Requests/MyLA311-Service-Request-Data-2022/i5ke-k6by/about_data
- JSON of Neighborhood Council boundaries (from our repo): https://raw.githubusercontent.com/hackforla/311-data/main/data/nc-boundary-2019-modified.json
- DuckDb + Python: https://duckdb.org/docs/api/python/overview.html
- Duckdb Spatial Extension: https://duckdb.org/docs/extensions/spatial
Adjusting this ticket, most likely @mru-hub will pick this up once there is enough instructions to get started
Note: I'm realizing that we may need to do work to prune old data. Adding a check into our cleaning logic will simply stop new data (e.g. requests falling outside NC boundaries) from being added -- we'll still need to handle old data that has the same problem.
Follow up ticket: Make sure we are cleaning 2023 and prior data with the same logic
Note for Ryan: provide an example resource of checking if a Lat/Long is within a provided boundary in Duckdb
This ticket is ready to be picked up
@mru-hub's update from this previous week is on the PR: https://github.com/hackforla/311-data/pull/1736#issuecomment-2164259492
Latest PR: https://github.com/hackforla/311-data/pull/1744
Update: added dependency for needing 2024 NC boundary json file