project-ideas icon indicating copy to clipboard operation
project-ideas copied to clipboard

Bot to notify us when there's a new dataset on Socrata

Open luqmaan opened this issue 8 years ago • 15 comments

What problem are we trying to solve?

There is a lot going on in the City of Austin's data portal (data.austintexas.gov). But to know what's going on, you have to:

  • remember to check the data portal
  • see what's changed since the last time you visited

What if instead:

  • A bot posts to Slack anytime a new dataset is created
  • A bot posts to Twitter anytime a new dataset is created (like https://twitter.com/OpenDataChicago)
  • Send a weekly/monthly email with links and descriptions of new datasets

This bot should also be created in a way that it can work for any Socrata portal (e.g. data.texas.gov) and be easy to fork. Other Code for America brigades may find this useful.

Who will benefit (directly and indirectly) from this project?

People interested in finding cool datasets to use in their next project.

Links to any research/data available/ articles

Some initial googling doesn't show any things like this that exist already.

What are the next steps (validation, research, coding, design)?

Research.

What help is needed at this time?

  • Find out if there is something like this that exists already
  • Figure out what endpoints to use in the Socrata API (https://dev.socrata.com/)

luqmaan avatar Mar 18 '16 15:03 luqmaan

Socrata has an email notification feature: https://support.socrata.com/hc/en-us/articles/202949758-Subscribe-to-notifications-when-dataset-is-made-public

Found out from @technickle on cfa.slack.com screen shot 2016-03-18 at 11 07 10 am

mateoclarke avatar Mar 18 '16 16:03 mateoclarke

What about a twitter bot in addition to Slack bot and email?

mateoclarke avatar Mar 18 '16 16:03 mateoclarke

:+1:

This would be great for all cities (and the federal government)!

Chicago created a Twitterbot w/ yahoo pipes back in the day (RIP yahoo pipes): https://twitter.com/OpenDataChicago cc @derekeder

It'd be great to get email alerts based on keywords like Scout too: https://scout.sunlightfoundation.com/ cc @konklone

rebeccawilliams avatar Mar 18 '16 17:03 rebeccawilliams

The @openaddresses project has a bot that alerts of any new possibly relevant/updated spatial datasets from the ESRI ArcGIS Open Data Aggretator

riordan avatar Mar 18 '16 17:03 riordan

@rebeccawilliams https://twitter.com/OpenDataChicago :+1: Looks great, added twitter to the issue.

@riordan Have a link to the @openaddresses bot or what its output looks like?

luqmaan avatar Mar 18 '16 17:03 luqmaan

I'd suggest a simple thing that polls the API endpoint for "list all datasets." If the result is different from the most recent poll (cached somewhere), it triggers an alert based on the diff and caches the new version.

Might even be feasible to do this with Zapier in some form.

daguar avatar Mar 18 '16 18:03 daguar

Looks like there's an RSS feed with a title of "Newly created and updated datasets for data.austintexas.gov": https://data.austintexas.gov/catalog.rss

An RSS reader bot should be able to handle that!

hampelm avatar Mar 18 '16 18:03 hampelm

I'm interested in hacking on this tomorrow at the OpenHack. My initial thought is to use feedparser to dump the catalog feed into a Postgres database for the input side. Output plugins (slack, twitter, etc) could then use LISTEN to get notifications of new items and do whatever is appropriate.

decibel avatar Mar 18 '16 20:03 decibel

Howdy! I run the @OpenDataChicago twitter. It was powered by the Socrata RSS feed + Yahoo Pipes (RIP indeed @rebeccawilliams)

A major challenge we encountered was with how Socrata publishes to their RSS feed. Chicago folks create the dataset first, then vets it, then publishes it. The RSS feed never picked up on this and after a long series of customer service tickets it was still never resolved (@tomschenkjr probably has more details).

I stubbed out some ideas for a new version based on the Socrata API, but never got to implementing anything.

derekeder avatar Mar 18 '16 20:03 derekeder

@daguar this would be feasible through the data.json available for portals, e.g., data.cityofchicago.org/data.json (this power's the R package's RSocrata function, ls.socrata())

@derekeder - yes, from what I understand, the publishing workflow when turning a data set from private to public misses the "publish to RSS feed" step. So this impacts the catalog.rss feed that @hampelm mentioned.

tomschenkjr avatar Mar 18 '16 21:03 tomschenkjr

The data portal analysis project could easily be extended to do this; it uses an internal database to keep track of every data resource available, so this issue (publishing of new datasets) corresponds to program logic that already exists: a new dataset was published if a new record is stored in the database.

The portal analyzer should work with any Socrata API endpoint, and it gets the info by querying SODA directly so that should bypass any problems with RSS feeds being out of date etc.

@luqmaan What do you think about moving the portal analysis project in this direction?

mtb33 avatar Mar 18 '16 23:03 mtb33

What do you think about moving the portal analysis project in this direction?

@mtb33 :+1: for moving the portal analysis project in that direction. I think the most important thing to keep in mind is making sure its easy to setup for other cities.

luqmaan avatar Mar 19 '16 00:03 luqmaan

@mtb33 hi! trying to get a status update on this project and update the tags. looks like you've done a ton of work on the data portal analysis. do you know if anyone's started to create a bot based on this work?

amaliebarras avatar Mar 03 '17 02:03 amaliebarras

I know this is an old issue but now that Slack has a native RSS app that you can connect I thought I might mention it. We could just have it fetch from https://data.austintexas.gov/catalog.rss to start out with, I'd imagine there's something similar for Socrata.

https://get.slack.help/hc/en-us/articles/218688467-Add-RSS-feeds-to-Slack

nickhammond avatar Jan 18 '18 03:01 nickhammond

I just created a Slack channel called #austin-data-portal with this RSS integration. If it works I'll close this issue.

mscarey avatar Apr 07 '19 20:04 mscarey