project-ideas
project-ideas copied to clipboard
Bot to notify us when there's a new dataset on Socrata
What problem are we trying to solve?
There is a lot going on in the City of Austin's data portal (data.austintexas.gov). But to know what's going on, you have to:
- remember to check the data portal
- see what's changed since the last time you visited
What if instead:
- A bot posts to Slack anytime a new dataset is created
- A bot posts to Twitter anytime a new dataset is created (like https://twitter.com/OpenDataChicago)
- Send a weekly/monthly email with links and descriptions of new datasets
This bot should also be created in a way that it can work for any Socrata portal (e.g. data.texas.gov) and be easy to fork. Other Code for America brigades may find this useful.
Who will benefit (directly and indirectly) from this project?
People interested in finding cool datasets to use in their next project.
Links to any research/data available/ articles
Some initial googling doesn't show any things like this that exist already.
What are the next steps (validation, research, coding, design)?
Research.
What help is needed at this time?
- Find out if there is something like this that exists already
- Figure out what endpoints to use in the Socrata API (https://dev.socrata.com/)
Socrata has an email notification feature: https://support.socrata.com/hc/en-us/articles/202949758-Subscribe-to-notifications-when-dataset-is-made-public
Found out from @technickle on cfa.slack.com
What about a twitter bot in addition to Slack bot and email?
:+1:
This would be great for all cities (and the federal government)!
Chicago created a Twitterbot w/ yahoo pipes back in the day (RIP yahoo pipes): https://twitter.com/OpenDataChicago cc @derekeder
It'd be great to get email alerts based on keywords like Scout too: https://scout.sunlightfoundation.com/ cc @konklone
The @openaddresses project has a bot that alerts of any new possibly relevant/updated spatial datasets from the ESRI ArcGIS Open Data Aggretator
@rebeccawilliams https://twitter.com/OpenDataChicago :+1: Looks great, added twitter to the issue.
@riordan Have a link to the @openaddresses bot or what its output looks like?
I'd suggest a simple thing that polls the API endpoint for "list all datasets." If the result is different from the most recent poll (cached somewhere), it triggers an alert based on the diff and caches the new version.
Might even be feasible to do this with Zapier in some form.
Looks like there's an RSS feed with a title of "Newly created and updated datasets for data.austintexas.gov": https://data.austintexas.gov/catalog.rss
An RSS reader bot should be able to handle that!
I'm interested in hacking on this tomorrow at the OpenHack. My initial thought is to use feedparser to dump the catalog feed into a Postgres database for the input side. Output plugins (slack, twitter, etc) could then use LISTEN to get notifications of new items and do whatever is appropriate.
Howdy! I run the @OpenDataChicago twitter. It was powered by the Socrata RSS feed + Yahoo Pipes (RIP indeed @rebeccawilliams)
A major challenge we encountered was with how Socrata publishes to their RSS feed. Chicago folks create the dataset first, then vets it, then publishes it. The RSS feed never picked up on this and after a long series of customer service tickets it was still never resolved (@tomschenkjr probably has more details).
I stubbed out some ideas for a new version based on the Socrata API, but never got to implementing anything.
@daguar this would be feasible through the data.json
available for portals, e.g., data.cityofchicago.org/data.json (this power's the R package's RSocrata function, ls.socrata()
)
@derekeder - yes, from what I understand, the publishing workflow when turning a data set from private to public misses the "publish to RSS feed" step. So this impacts the catalog.rss
feed that @hampelm mentioned.
The data portal analysis project could easily be extended to do this; it uses an internal database to keep track of every data resource available, so this issue (publishing of new datasets) corresponds to program logic that already exists: a new dataset was published if a new record is stored in the database.
The portal analyzer should work with any Socrata API endpoint, and it gets the info by querying SODA directly so that should bypass any problems with RSS feeds being out of date etc.
@luqmaan What do you think about moving the portal analysis project in this direction?
What do you think about moving the portal analysis project in this direction?
@mtb33 :+1: for moving the portal analysis project in that direction. I think the most important thing to keep in mind is making sure its easy to setup for other cities.
@mtb33 hi! trying to get a status update on this project and update the tags. looks like you've done a ton of work on the data portal analysis. do you know if anyone's started to create a bot based on this work?
I know this is an old issue but now that Slack has a native RSS app that you can connect I thought I might mention it. We could just have it fetch from https://data.austintexas.gov/catalog.rss to start out with, I'd imagine there's something similar for Socrata.
https://get.slack.help/hc/en-us/articles/218688467-Add-RSS-feeds-to-Slack
I just created a Slack channel called #austin-data-portal with this RSS integration. If it works I'll close this issue.