interpolation icon indicating copy to clipboard operation
interpolation copied to clipboard

add support for libpostal over http service, adapter pattern et al.

Open missinglink opened this issue 4 years ago • 6 comments

this code is branched off https://github.com/pelias/interpolation/pull/244, please merge that PR first! ▶️ see only the diff ◀️

this draft PR is all the best bits of https://github.com/pelias/interpolation/pull/146 & https://github.com/pelias/interpolation/pull/240 rebased on top of https://github.com/pelias/interpolation/pull/244

See https://github.com/pelias/interpolation/pull/146 for more info on the functionality and how to configure the service.

closes https://github.com/pelias/interpolation/pull/146 closes https://github.com/pelias/interpolation/pull/240

missinglink avatar Mar 17 '20 16:03 missinglink

Okay, I noticed that the demo didn't work with this PR, so I tried to figure out why. It was pretty difficult and actually exposed an issue in the error handling for the server.

I was using Portland Metro data, and was using the demo to explore, which generated the following query:

http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint

This URL returns a 400 error, but also an empty JSON response, as shown below:

julian@rockaway ~/repos/pelias/interpolation $ curl -I "http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint"
HTTP/1.1 400 Bad Request
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
Content-Length: 2
ETag: W/"2-vyGp6PvFo4RvsFtPoIWeCReyIC8"
Date: Thu, 19 Mar 2020 19:49:27 GMT
Connection: keep-alive

julian@rockaway ~/repos/pelias/interpolation $ curl "http://localhost:4300/extract/geojson?lat=45.537858113762816&lon=-122.62214183807374&names=ne%20tillamook%20saint"
{}

Diving deper, I took a look at the relevant express code: https://github.com/pelias/interpolation/blob/3fa614e873b291fa19d637a36c9db128daf05336/cmd/server.js#L99-L104

This code doesn't properly print the error to the response body, as JSON.stringify(err) ends up with an empty object.

Printing the error to stdout reveals the following:

interpolation_1  | SqliteError: near "/": syntax error
interpolation_1  |     at DynamicQueryCache.getStatement (/code/pelias/interpolation/query/DynamicQueryCache.js:70:28)
interpolation_1  |     at Object.module.exports [as extract] (/code/pelias/interpolation/query/extract.js:36:22)            
interpolation_1  |     at Object.q [as query] (/code/pelias/interpolation/api/extract.js:35:26)
interpolation_1  |     at processTicksAndRejections (internal/process/task_queues.js:94:5)

So there are two issues here:

  1. Higher priority: fix the error messaging for all endpoints.
  2. Improve this PR so it doesn't cause errors.

orangejulius avatar Mar 19 '20 19:03 orangejulius

Demo issues fixed in https://github.com/pelias/interpolation/pull/244/commits/6a5e0840a00d9d129720286a7f83cb1ca6b48120 and rebased to all 3 branches.

missinglink avatar Mar 20 '20 15:03 missinglink

Screenshot 2020-03-20 at 15 53 55

missinglink avatar Mar 20 '20 15:03 missinglink

I just did a quick rebase of this now that #244 is merged. We can continue to test this out and hopefully merge it soon!

orangejulius avatar Mar 30 '20 15:03 orangejulius

Well its been a while for this one but after some of the cleanup and maintenance work we've been doing in the interpolation repository, I took another look at this.

With all the better-sqlite3 stuff long since merged, this PR is a lot simpler now. I've rebased it on top of the latest changes from the master branch and it was all straightforward.

The development workflow for updating the mock libpostal responses went smoothly (set up libpostal and configure the service entry in pelias.json, run SEED_MOCK_LIBPOSTAL=true npm run ci, tests shoudl pass after that), and everything else seems to still work.

Maybe we take a look at getting this merged, since it's been quite helpful for avoiding the extra memory usage of having libpostal data loaded twice.

My current questions are:

  • Will an interpolation build in the docker projects still work? Will it run a lot slower using a remote libpostal service?
  • What should we do around messaging that a new configuration section is now recommended? Maybe add some deprecation notices to pelias/config? We'd want the Pelias API to start preferring the new services section too.
  • Should we add a pelias libpostal wait command or something similar to the docker CLI, and update the documentation on when to use it? Without something like that, running pelias compose up libpostal; pelias prepare interpolation might fail as the libpostal service takes a few seconds to start (though nothing like elasticsearch)

orangejulius avatar Nov 05 '21 18:11 orangejulius

I agree it would be great to merge this functionality, since it provides a lot of flexibility going-forward.

Regarding which adapter is appropriate for default => (HTTP | NPM) is yet to be decided.

One of the complexities here is that this PR couples the adapter pattern code with a change to the default adapter to change it from NPM Module => HTTP.

So.. while I'm not totally against that, the HTTP and NPM Module adapters both have pros and cons, neither is a clearly superior method to the other, but I just wanted to note that its two changes in one, which is making this harder to merge.

Coupled with the unknown topic of performance (and consequently) build times it might be better to merge the code changes without changing the default adapter and then measuring the performance of different methods?

Alternatively we can use the existing code to measure the performance of a full planet build using HTTP and compare it to NPM Module which should give us some more confidence that changing the default adapter is the right choice, despite the additional complexity of network service discovery and I/O.

It might be worth running that build twice, since changes to how the NPM method is used in this PR vs. prior may also affect the performance of build times using the NPM method.

missinglink avatar Nov 08 '21 11:11 missinglink