311-data
311-data copied to clipboard
Make 311 Data's request data easily accessible for data scientists
Overview
Currently, we have no way of providing our clean request data to our data scientists. @priyakalyan is working on PR #1257, but this script takes a long time to execute to get all the data.
I see two solutions to this:
- Make our SQL DB accessible to data scientists + create Python methods that read from the DB and produce dataframes.
- During our nightly Prefect runs, add a step that also writes the request data out as a csv file to S3.
I'm leaning towards (1) right now, since it will use less disk space and has fewer moving parts. (1) will take longer to implement, however, since we currently don't have access to make infrastructure changes (waiting for SSH keys or Terraform configs from @mattyweb).
Action Items
- [ ] Get SSH key passphrase for DB access
- [ ] Make our SQL DB accessible to data scientists + create Python methods that read from the DB and produce dataframes
@joshuayhwu , who might be interested in working on this.
I was taking a look at the lacity API config and attempting to connect to postgres locally. I was wondering what's the passcode for the database, if any? (the code said it defaults to none but psycopg2 said db refused connection)
Please take a look at this section of the Terraform README. Basically, we cannot directly access our DB right now because it's in a virtual private network (or cloud?). It can only be access through the bastion server, but we don't have the SSH key from Matt yet :(
Interesting. What are your thoughts on directly accessing the Socrata API and building a separate db?
Hmm I think that would help us get a solution faster, but we'd have to maintain another component in an already too-complicated system.
If we think that this task is highest priority (i.e., we want it completed in the coming weeks), I would go with option 2 listed in the first comment.
Sounds good. We can discuss more tonight.
Hey @joshuayhwu Do you have an update for us on this issue?
Please update:
- Progress:
- Blockers:
- Availability:
- ETA:
Thanks!