data-science
data-science copied to clipboard
CoP: Data Science: Create district types reusable tool (API, single dataset, etc.)
Overview
We need to create a tool so that each project at H4LA that renders points on a map can use District Files to help people analyze or view the data.
Action Items
- [x] Identify large groups/districts
- [x] Identify links for groups/districts
- [x] Locate and obtain shape files for these districts #124
- [x] Determine what files types we will make these available (shp, npm, and/or GeoJSON)
- [ ] Put files in GitHub repository so they are available to use in the organization.
- [x] research how we will create a data set out of this info that will be self updating (meaning are there apis for these groups)
- [ ] ...
Resources
Example Neighborhood Council Shape File
Initial Identification of Large Groups/Districts
create a npm package for delivering the data. We need to get a backend person involved and we need to make one for each time they change, so la-shape-files-2021, la-shape-files-2022
next steps are talking to 311 team, tdm team, food oasis, luckparking
Feedback from Mike Morgan on 12/9: Since the shape files for the various districts are small enough (less than 50MB, see here), they can be stored in a repository. We should also consider making these available as npm and GeoJSON.
Notes from 3/11 meeting with Abe, Bonnie, John (Food Oasis) and Mike:
Food Oasis uses PostGRES DB's own geometry data type to run scripts, and then converts to geojson to send to client.
- Can take lat/lon and returns NC
- Can render a neighborhood on a map
PostGRES can also consume geojson to convert to its proprietary geometric data type.
This issue will have to get re-written to check and see if the shape files are out of date. But the programming using the shape files, should be built first, given that up to date shape files, with no programming is useless.
Next steps: Create a script that can be run to automate downloading the various shape files from the various district types listed above. We will want to note the data the files was last updated and the date the file was downloaded.
Update on issue #118, district types reusable tool:
-
Familiarization: I conducted a review of each target site so that I can understand the layout, available data, and challenges in data extraction.
-
APIs: Looked for available apis to simplify the extraction process.
-
Created a spreadsheet to keep tabs on each site
-
Initiated a Jupyter Notebook to document coding and data collection/automation.
Using the GeoHub L.A. website I programmatically created shape files: Data Acquisition: Utilizing the GeoHub LA website, I identified and accessed URL endpoints for the API calls corresponding to our project's requirements. Data Extraction: Through programmatic queries, I fetched JSON data from the different district API endpoints, capturing geographical information such as boundaries, points of interest, and administrative divisions. Shapefile Creation: Using the gathered JSON data, I made shapefiles, the geospatial data format compatible with various GIS software and tools. Compression Exploration: To optimize storage and handling of the shapefiles, I'm trying out compressing the data using TruncatedSVD.
Update: Data Acquisition, Extraction, Shapefile, and compression exploration can be accessed in my repo, HERE
This week I will look into how we can run the data collection script on a quarterly basis and have it collect in Google Drive and/or GitHub, or whatever is best for the team.
Here's an update on data acquisition and extraction of district shape files:
Update on the Shape File Automation Project
- Implemented Google Drive functions to add files directly to Google Drive.
- Updated the main function to create shape files with new functionalities.
- Explored automation options using Google Cloud Functions for continuous data collection of district shape files.
Consideration:
- Google Cloud Function seems to be a viable solution for automating the data collection process. However, it requires setting up with a credit card. I will investigate if Hack for LA has an account or could provide a credit card for this purpose.
Next Steps:
- Confirm the availability of a credit card or an existing Google Cloud account through Hack for LA.
- If available, proceed with setting up the Google Cloud Function.
- Test the entire automation workflow to ensure everything is functioning as expected.
- Or investigate other automation avenues.
I've also pushed all recent updates to the repository, and you can check the latest commits for detailed changes.
Project Update:
- A GitHub workflow has successfully been integrated to automatically update files in my Google Drive.
- Adjustments made in main script to ensure compatibility with the GitHub workflow.
- Secrets have been configured for Google API JSON file and Google Drive Folder ID.
- I will update the ID to our HFLA Google Drive Folder.
- Automation is set for every other month on the first of the month.
- Current updates to the repository
I can adjust the code to update a GitHub folder. We can do both Google Drive and GitHub, need be.
This week I refined the setup of environment variables to enhance both local development and CI/CD workflows in GitHub Actions. By leveraging os.getenv() for securely accessing environment variables I've streamlined the development process significantly. This ensures that applications run smoothly with the necessary configurations without hardcoding sensitive information.
Additionally, I've discussed with our project manager about updating the top-level Google folder structure. This change aims to improve the automation process for storing shape files.
I gathered all the information for transferring my current repo with the District Shape File pipeline, into a new repo established in Hack for LA account for housing the shape data. Below are the steps involved. The transfer will be completed within the week. In the meantime, shape file data is in Hack for LA Google Drive.
Steps for Repository Transfer
The following steps have been determined for transferring the repository associated with the district data collection:
-
Prepare New Repository
- A new empty repository has been established to house the district data collection.
-
ETL Process Completion
- The ETL process has been completed in my current repository.
-
Code Transfer Process
- Clone the new repository locally.
- Add the new repository as a remote to the existing project.
- Pull the latest code from the current (old) repository.
- Push the code to the new repository.
-
Transfer Automation Components
- Transfer GitHub Actions and secrets necessary for pipeline automation.
-
Update Documentation
- The README file will be updated to reflect changes and provide guidance for the new repository setup.
@parcheesime Is there still work to be done on this issue or is it complete?
@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?
@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?
Follow-up: @akhaleghi I have the data updating on my personal repository. I will need assistance in adding my project to our data science repo. @salice may have made a one but it was awhile ago before the repository updates.