machine icon indicating copy to clipboard operation
machine copied to clipboard

Need a scraper for Beacon Schneider

Open iandees opened this issue 8 years ago • 14 comments

There are several sources (in Minnesota, Missouri, and Wisconsin) that use Beacon Schneider as their GIS platform. Beacon Schneider doesn't have a great API like Esri does, but it should be possible to scrape them.

Their web app allows you to select data from layers, and some of the maps have an addresses layer. When you use the select tool, it makes an XHR that looks like this:

curl 'https://beacon.schneidercorp.com/api/beaconCore/GetVectorLayer?QPS=zg0f6QnaoIz-ILjoPSh5H9YcV9JnKIDzNg285gFzVrL28shpIWvHaC2_O7y-NcScTgL7ErXoqqD8xqmgIfsdD9VDBBpSSIgIMAaNlF08ZgthtQ2kCj3No9xlIoUzQXfVPwgbY0ViQudzzopRtOquRz-LqgkdUxoYP7O186lM2Oj5Rm777KDRTm-zlQ5DqhwsA7S1bni1WLpjbqXOGLYT2s5Z--ylbHaL2sRioEUw2RZAzPfqGw4N8aE4cd9K8twiz92WT5l3bR2gWtc2icpRcA2' \
    -H 'Content-Type: application/json' \
    --data-binary '{"layerId":13494,"useSelection":false,"ext":{"minx":1409946.25,"miny":292761.4375,"maxx":1411905.625,"maxy":294047.0625},"wkt":null,"spatialRelation":1,"featureLimit":0}'

Which gives back JSON-wrapped HTML with details for the records enclosed in the bounding box.

Unknowns:

  • What is the "QPS" parameter above?
  • What projection is the bounding box in? (presumably the layer's bounding box)
  • Is there a length limit on the response? (probably: we'll have to use a tiling mechanism to make sure we get all the data)

iandees avatar Mar 15 '17 15:03 iandees

"bs2geojson"

migurski avatar Mar 15 '17 16:03 migurski

We're starting to see more of these Beacon Schneider systems in the wild.

nvkelso avatar Apr 10 '17 16:04 nvkelso

The sample curl invocation above no longer works. What’s a current county with data published in this format?

migurski avatar Apr 10 '17 18:04 migurski

Based on that, I imagine the QPS arg is a session token or something...

Here's how I got to this URL:

  1. Go to a Beacon client page like https://beacon.schneidercorp.com/Application.aspx?AppID=26&LayerID=155&PageTypeID=2&PageID=277 (from https://github.com/openaddresses/openaddresses/issues/2661)
  2. Click "Map"
  3. Enable the "Addresses" layer with the checkbox, zoom in close enough to see addresses
  4. Enable your browser developer tools
  5. Click "Selection tools", switch to "Select click/rectangle" mode, then click the "Addresses" layer name so the "i"/info icon is next to it (this is the layer you're selecting)
  6. Drag a box around some addresses and observe the network request

iandees avatar Apr 10 '17 18:04 iandees

Thanks, makes sense! It’s a bit of hassle, but it seems possible to do a recursive geographic descent. I fiddled with some of the inputs, and this instance appears to have a 500 item response limit for this posted JSON:

{
   "layerId":13494,
   "useSelection":false,
   "ext": { 
      "minx":0,
      "miny":0,
      "maxx":40000000,
      "maxy":40000000
   },
   "wkt":null,
   "spatialRelation":1,
   "featureLimit":0
}

migurski avatar Apr 10 '17 18:04 migurski

Started a thing in https://github.com/openaddresses/pybeacondump.

migurski avatar Apr 10 '17 20:04 migurski

You guys are geniuses.

justinelliotmeyers avatar Apr 11 '17 00:04 justinelliotmeyers

Okay so the code above works great on Taney, MO: https://github.com/openaddresses/openaddresses/pull/2844

The Beacon API is a bit half-assed, and uses mystery projected coordinates for bounding boxes and so forth. I'm not bothering to try to convert them because Lat and Long columns exist in the data source. For another county, this might have to change. What’s the next place to try this?

migurski avatar Apr 11 '17 01:04 migurski

There's a bunch in https://docs.google.com/spreadsheets/d/1HFm0YbFDC5YKkHKFt89EKdhpyaCCIo_CiDFjBNq4PoI/edit#gid=0 (search for "beacon").

Try Page County, IA? https://beacon.schneidercorp.com/Application.aspx?AppID=220&LayerID=3039&PageTypeID=1&PageID=2857

iandees avatar Apr 11 '17 01:04 iandees

Okay, harder. This one has very little data in the "result HTML" (right pane) though there does seem to be a way to get more by clicking on a point (bottom pane):

screen shot 2017-04-10 at 7 08 21 pm

There is no "Lat" or "Long" in the properties, which suggests we’ll need to figure out the projection used in order to take advantage of the WKT values.

migurski avatar Apr 11 '17 02:04 migurski

I tried Iowa state plane projections and http://spatialreference.org/ref/epsg/3418/ looks close. Seems necessary to add client-side projection here.

migurski avatar Apr 11 '17 02:04 migurski

Result geometries look good, but the properties are all blank because the HTML here is formatted differently. I have a feeling these are each the result of a bespoke consulting arrangement with a local tax assessor.

iowa

migurski avatar Apr 11 '17 02:04 migurski

Hey @migurski did you figure this out?

I've been playing with their api as well. The projection they're using is State Plane Coordinates, so it's different for whatever state you're in. If you open the javascript console in your dev tools, there's a global called mapConfig that gets passed down in a script tag. It has the SRID in it. You'll also find your QPS parameter in there too, which does indeed seem to be a session token of some kind.

Example: image

kirkedev avatar Jun 29 '17 21:06 kirkedev

using https://github.com/larsbutler/geomet to load the WKT

The make feature function becomes

def make_feature(record):
    ''' Get a complete GeoJSON feature object for a record.
    '''
    return dict(
        type='Feature',
        id=record.get('Key'),
        geometry=wkt.loads(record.get('WktGeometry')),
        properties=extract_properties(record)
        )

That made it handle the geometries for my counties without issue. Looks like you just had it set to create points prior to that.

psyon avatar Apr 22 '18 14:04 psyon