data-science icon indicating copy to clipboard operation
data-science copied to clipboard

CoP: Data Science: Create district types reusable tool (API, single dataset, etc.)

Open ExperimentsInHonesty opened this issue 3 years ago • 15 comments

Overview

We need to create a tool so that each project at H4LA that renders points on a map can use District Files to help people analyze or view the data.

Action Items

  • [x] Identify large groups/districts
  • [x] Identify links for groups/districts
  • [x] Locate and obtain shape files for these districts #124
  • [x] Determine what files types we will make these available (shp, npm, and/or GeoJSON)
  • [ ] Put files in GitHub repository so they are available to use in the organization.
  • [x] research how we will create a data set out of this info that will be self updating (meaning are there apis for these groups)
  • [ ] ...

Resources

Example Neighborhood Council Shape File

Initial Identification of Large Groups/Districts

ExperimentsInHonesty avatar Sep 28 '21 18:09 ExperimentsInHonesty

create a npm package for delivering the data. We need to get a backend person involved and we need to make one for each time they change, so la-shape-files-2021, la-shape-files-2022

ExperimentsInHonesty avatar Dec 07 '21 19:12 ExperimentsInHonesty

next steps are talking to 311 team, tdm team, food oasis, luckparking

ExperimentsInHonesty avatar Dec 07 '21 20:12 ExperimentsInHonesty

Feedback from Mike Morgan on 12/9: Since the shape files for the various districts are small enough (less than 50MB, see here), they can be stored in a repository. We should also consider making these available as npm and GeoJSON.

akhaleghi avatar Dec 10 '21 19:12 akhaleghi

Notes from 3/11 meeting with Abe, Bonnie, John (Food Oasis) and Mike:

Food Oasis uses PostGRES DB's own geometry data type to run scripts, and then converts to geojson to send to client.

  • Can take lat/lon and returns NC
  • Can render a neighborhood on a map

PostGRES can also consume geojson to convert to its proprietary geometric data type.

The recording of the meeting

akhaleghi avatar Mar 18 '22 17:03 akhaleghi

This issue will have to get re-written to check and see if the shape files are out of date. But the programming using the shape files, should be built first, given that up to date shape files, with no programming is useless.

ExperimentsInHonesty avatar Feb 13 '24 03:02 ExperimentsInHonesty

Next steps: Create a script that can be run to automate downloading the various shape files from the various district types listed above. We will want to note the data the files was last updated and the date the file was downloaded.

akhaleghi avatar Feb 26 '24 20:02 akhaleghi

Update on issue #118, district types reusable tool:

  • Familiarization: I conducted a review of each target site so that I can understand the layout, available data, and challenges in data extraction.

  • APIs: Looked for available apis to simplify the extraction process.

  • Created a spreadsheet to keep tabs on each site

  • Initiated a Jupyter Notebook to document coding and data collection/automation.

parcheesime avatar Mar 26 '24 01:03 parcheesime

Using the GeoHub L.A. website I programmatically created shape files: Data Acquisition: Utilizing the GeoHub LA website, I identified and accessed URL endpoints for the API calls corresponding to our project's requirements. Data Extraction: Through programmatic queries, I fetched JSON data from the different district API endpoints, capturing geographical information such as boundaries, points of interest, and administrative divisions. Shapefile Creation: Using the gathered JSON data, I made shapefiles, the geospatial data format compatible with various GIS software and tools. Compression Exploration: To optimize storage and handling of the shapefiles, I'm trying out compressing the data using TruncatedSVD.

parcheesime avatar Apr 23 '24 02:04 parcheesime

Update: Data Acquisition, Extraction, Shapefile, and compression exploration can be accessed in my repo, HERE

This week I will look into how we can run the data collection script on a quarterly basis and have it collect in Google Drive and/or GitHub, or whatever is best for the team.

parcheesime avatar Apr 23 '24 04:04 parcheesime

Here's an update on data acquisition and extraction of district shape files:

Update on the Shape File Automation Project

  • Implemented Google Drive functions to add files directly to Google Drive.
  • Updated the main function to create shape files with new functionalities.
  • Explored automation options using Google Cloud Functions for continuous data collection of district shape files.

Consideration:

  • Google Cloud Function seems to be a viable solution for automating the data collection process. However, it requires setting up with a credit card. I will investigate if Hack for LA has an account or could provide a credit card for this purpose.

Next Steps:

  • Confirm the availability of a credit card or an existing Google Cloud account through Hack for LA.
  • If available, proceed with setting up the Google Cloud Function.
  • Test the entire automation workflow to ensure everything is functioning as expected.
  • Or investigate other automation avenues.

I've also pushed all recent updates to the repository, and you can check the latest commits for detailed changes.

parcheesime avatar Apr 30 '24 00:04 parcheesime

Project Update:

  • A GitHub workflow has successfully been integrated to automatically update files in my Google Drive.
  • Adjustments made in main script to ensure compatibility with the GitHub workflow.
  • Secrets have been configured for Google API JSON file and Google Drive Folder ID.
  • I will update the ID to our HFLA Google Drive Folder.
  • Automation is set for every other month on the first of the month.
  • Current updates to the repository

I can adjust the code to update a GitHub folder. We can do both Google Drive and GitHub, need be.

parcheesime avatar May 01 '24 00:05 parcheesime

This week I refined the setup of environment variables to enhance both local development and CI/CD workflows in GitHub Actions. By leveraging os.getenv() for securely accessing environment variables I've streamlined the development process significantly. This ensures that applications run smoothly with the necessary configurations without hardcoding sensitive information.

Additionally, I've discussed with our project manager about updating the top-level Google folder structure. This change aims to improve the automation process for storing shape files.

District Data Collection Repo

parcheesime avatar May 20 '24 17:05 parcheesime

I gathered all the information for transferring my current repo with the District Shape File pipeline, into a new repo established in Hack for LA account for housing the shape data. Below are the steps involved. The transfer will be completed within the week. In the meantime, shape file data is in Hack for LA Google Drive.

Steps for Repository Transfer

The following steps have been determined for transferring the repository associated with the district data collection:

  1. Prepare New Repository

    • A new empty repository has been established to house the district data collection.
  2. ETL Process Completion

    • The ETL process has been completed in my current repository.
  3. Code Transfer Process

    • Clone the new repository locally.
    • Add the new repository as a remote to the existing project.
    • Pull the latest code from the current (old) repository.
    • Push the code to the new repository.
  4. Transfer Automation Components

    • Transfer GitHub Actions and secrets necessary for pipeline automation.
  5. Update Documentation

    • The README file will be updated to reflect changes and provide guidance for the new repository setup.

parcheesime avatar Jun 25 '24 00:06 parcheesime

@parcheesime Is there still work to be done on this issue or is it complete?

akhaleghi avatar Sep 23 '24 23:09 akhaleghi

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

parcheesime avatar Sep 24 '24 00:09 parcheesime

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

Follow-up: @akhaleghi I have the data updating on my personal repository. I will need assistance in adding my project to our data science repo. @salice may have made a one but it was awhile ago before the repository updates.

parcheesime avatar Oct 15 '24 01:10 parcheesime