311-data icon indicating copy to clipboard operation
311-data copied to clipboard

Adjust data cleaning script to prune data points outside of LA Neighborhood Districts

Open ryanfchase opened this issue 10 months ago • 7 comments

Dependency

  • [ ] obtain 2024 or most recent Boundaries JSON file

Overview

We need to remove 311 Data service requests on our map that do not fall within the boundaries of a Neighborhood Council since they are inaccessible and mostly confuse the users

Action Items

  • [ ] write python script that takes as input a CSV with a subset of 311 dataset (see example dataset below) that identifies 311 requests that do not fall within a Neighborhood Council boundary (provide proof of concept)
  • [ ] report on this ticket how many 2024 service requests fall outside NC boundaries
  • [ ] incorporate the functionality into our daily hugging face cron job @ updateHfDataset.py

Resources/Instructions

Screenshot of requests outside NC boundaries

image

Useful Links

  • DBeaver: https://dbeaver.io/
  • DBeaver + DuckDb: https://duckdb.org/docs/guides/sql_editors/dbeaver.html
  • Example dataset (2022 311 data): https://data.lacity.org/City-Infrastructure-Service-Requests/MyLA311-Service-Request-Data-2022/i5ke-k6by/about_data
  • JSON of Neighborhood Council boundaries (from our repo): https://raw.githubusercontent.com/hackforla/311-data/main/data/nc-boundary-2019-modified.json
  • DuckDb + Python: https://duckdb.org/docs/api/python/overview.html
  • Duckdb Spatial Extension: https://duckdb.org/docs/extensions/spatial

ryanfchase avatar Apr 12 '24 04:04 ryanfchase

Adjusting this ticket, most likely @mru-hub will pick this up once there is enough instructions to get started

ryanfchase avatar Apr 12 '24 04:04 ryanfchase

Note: I'm realizing that we may need to do work to prune old data. Adding a check into our cleaning logic will simply stop new data (e.g. requests falling outside NC boundaries) from being added -- we'll still need to handle old data that has the same problem.

Follow up ticket: Make sure we are cleaning 2023 and prior data with the same logic

ryanfchase avatar May 08 '24 22:05 ryanfchase

Note for Ryan: provide an example resource of checking if a Lat/Long is within a provided boundary in Duckdb

ryanfchase avatar May 08 '24 22:05 ryanfchase

This ticket is ready to be picked up

ryanfchase avatar May 08 '24 22:05 ryanfchase

@mru-hub's update from this previous week is on the PR: https://github.com/hackforla/311-data/pull/1736#issuecomment-2164259492

ryanfchase avatar Jun 13 '24 23:06 ryanfchase

Latest PR: https://github.com/hackforla/311-data/pull/1744

ryanfchase avatar Jun 27 '24 03:06 ryanfchase

Update: added dependency for needing 2024 NC boundary json file

ryanfchase avatar Sep 06 '24 22:09 ryanfchase