HPI icon indicating copy to clipboard operation
HPI copied to clipboard

Allow spinning up a web server and JSON api?

Open karlicoss opened this issue 4 years ago • 10 comments

This is more of a fun demonstration of what's it capable of, but could also help for integrating with other programming languages.

Should be fairly easy and with almost no boilerplate because namedtuples/dataclass map nicely into JSON.

karlicoss avatar Apr 12 '20 10:04 karlicoss

It would also open up building SPAs and mobile apps

Joshfindit avatar May 08 '20 15:05 Joshfindit

Could possibly be done with something similar to how hpi doctor tries to import the stats function from each module, perhaps that could even be reused here.

seanbreckenridge avatar Sep 18 '20 22:09 seanbreckenridge

Yep! For example, it could discover data providers by type annotations (e.g. pandas.DataFrame), or if they are marked by some decorators.

karlicoss avatar Sep 18 '20 22:09 karlicoss

Been playing around with fastapi and pydantic a bit; feel like they might integrate nicely here -- especially if the NamedTuple/dataclasses can be converted into pydantic Models by inspecting the type hints; that would mean all the routes would generate dynamically.

pydantic allows for NamedTuple-like model definitions to be automatically serialized/deserialized into JSON for APIs, FastAPI uses pydantic when creating routes, and also creates an OpenAPI spec/dashboard for all your routes, which would allow for easy testing/prototyping for free.

Is just a thought, but if this is used, since the deserialization is also handled, perhaps this could enable POST requests as well -- if you want to host a HPI server behind authentication on your webserver to accept data from your phone or something; synced back to the local data directory afterwards.

Not high on my priority list, but will probably try and create a prototype soon.

seanbreckenridge avatar Oct 23 '20 04:10 seanbreckenridge

Nice! I've had pydantic on my todolist as well, e.g. I like the idea of nicer exceptions: https://pydantic-docs.helpmanual.io/#example

My only worry from a quick glance was that it enforces inheriting from a BaseModel, and how it would play with mypy (there's something about a plugin?). But if it's easy to swap out BaseModel for NamedTuple (or anything else, as long it's got the same interface, i.e. duck type friendly), I guess it's ok.

karlicoss avatar Oct 25 '20 00:10 karlicoss

Have a pretty decent prototype up, Flask correctly serializes all of my data (though I havent tried with dataframes and the like, but this calls iter on the result, so most things seem to work), all event-like functions accept limit and page GET params to paginate the data.

https://github.com/seanbreckenridge/HPI_API

After: pip install git+https://github.com/seanbreckenridge/HPI_API

$ hpi_api server

$ curl 'localhost:5050/my/zsh/history?limit=3'
{"items":[{"command":"z","dt":"Mon, 18 May 2020 08:23:22 GMT","duration":0},{"command":"en env_config.zsh","dt":"Mon, 18 May 2020 08:23:22 GMT","duration":0},{"command":"ls","dt":"Mon, 18 May 2020 08:23:22 GMT","duration":0}],"limit":3,"page":1}
 $ curl 'localhost:5050/my/github/all/events?limit=1' | jq -r '.items | .[0]'
{
  "body": "Note: This is used for [gitopen](https://github.com/seanbreckenridge/dotfiles/commit/4c57fd97cbb2605e63d0cf5d2af37039fe6e6d35)",
  "dt": "Thu, 14 Feb 2019 21:05:40 GMT",
  "eid": "commoit_comment_https://github.com/seanbreckenridge/mac-dotfiles/commit/d4ac3c30dd3df1b626f92eb61f651a27852ff86f#commitcomment-32324943",
  "is_bot": false,
  "link": "https://github.com/seanbreckenridge/mac-dotfiles/commit/d4ac3c30dd3df1b626f92eb61f651a27852ff86f#commitcomment-32324943",
  "summary": "commented on https://github.com/seanbreckenridge/mac-dotfiles/commit/d4ac3c30dd3df1b626f92eb61f651a27852ff86f#commitcomment-32324943"
}
$ curl 'localhost:5050/my/mpv/stats'
{"value":{"history":{"count":5861}}}

If you end up testing, let me know if you have any issues/suggestions

Not sure if this should be merged into here/kept separate; linked to, I'm fine either way.

Sidenote:

Initially tried to use FastAPI, but since all of this is dynamic, it just feels like I'm fighting with its JSON encoder and its class-model based approach. Would have to do some serious metaprogramming/class generation /w runtime type inspection and it didn't seem worth it.

Custom Encoders through its json_enodable function run before any custom ones I define, so its not possible to serialize custom types

at that point, I'm getting none of the benefits of the types that pydantic gives me, so its not particularly worth to use it

seanbreckenridge avatar Dec 20 '20 08:12 seanbreckenridge

Whoa, this is awesome, well done! Very cool that it seems pretty compact. I'll definitely try it out in a couple of days, a bit busy ATM.

Not sure if this should be merged into here/kept separate; linked to, I'm fine either way.

Yeah.. I guess for now OK to keep in a separate repo and play around with it.. I guess in principle would be cool to keep decoupled if possible, although if it ends up being compact and generic enough, doesn't hurt to keep in the 'main' package, I guess.

karlicoss avatar Dec 21 '20 18:12 karlicoss

in principle would be cool to keep decoupled if possible

The only real 'dependencies' it has on HPI are in the discovery file, using core_config to check if modules are disabled (though thats not really needed, its just to avoid some import errors if you have modules disabled; if its not able to be imported it just assumes all are active), and the modules function from my.core.util to iterate through modules.

Otherwise it just usesimportlib and inspect to list/find functions in each HPI module. The rest of the code is handling GET arguments/generating the basic Flask application

seanbreckenridge avatar Dec 21 '20 18:12 seanbreckenridge

@seanbreckenridge sorry for the delay, was busy with other stuff. Just properly checked it out and it's brilliant, worked flawlessly!

I wanted to do a proper demo in an Observable notebook connected to HPI on my computer (via a proxy): https://observablehq.com/@karlicoss/hpi_meets_http

However it turned out that I can't export static HTML in observable, and of course it can't request the data unless I keep the server running... So I just did a screenshot :shrug:

Screenshot_2021-02-11 HPI meets HTTP karlicoss Observable

Some potential ideas for the future:

  • for iterable outputs might be cool to use streaming HTTP, i.e. stream of jsons (don't remember how to do this exactly, but pretty sure it's possible?)
  • pagination should ideally be via cursors rather than pages? The tricky bit I'd imagine is that you need some uniform 'entity id' or timestamp for that. On the other hand it's another thing that would be very useful for other purposes :)
  • I guess eventually need to think about caching Perhaps generally the approach to caching needs to be brainstormed, there are still some rough edges around cachew
  • Had to disable CORS to experiment (have to admit I always struggle with it, guess I need to properly learn about it...)
     -        return jsonify(obj), 200
     +        response = jsonify(obj)
     +        response.headers.add("Access-Control-Allow-Origin", "*")
     +        return response, 200
    
    Maybe the cli needs --no-cors option? Although I don't know Flask, maybe it's already possible to control via some setting/env variable?

karlicoss avatar Feb 11 '21 01:02 karlicoss

CORS slipped my mind, added a flag to enable/disable it, is enabled by default.

Maybe the cli needs --no-cors option? Although I don't know Flask, maybe it's already possible to control via some setting/env variable?

Typically people will recommend to just use flask_cors, but I just added a hook.

for iterable outputs might be cool to use streaming HTTP

havent done this before, will have to look into it


Regarding pagination, I initially just went with limit/page since it seemed to be the easiest to implement for unknown data. All I'm really doing right now is calling iter on it.

The tricky bit I'd imagine is that you need some uniform 'entity id' or timestamp for that

I suppose the only issue with using the default hash function is that theres nothing that says a function in HPI can't return duplicate items, so that would cause conflicts with where the cursor would be placed.

Not saying it can't be done, I'm just at a loss other than like... storing a large sequence of hash values in memory leading up to the cursor and searching for that? Seems quite finicky, unsure.

Perhaps the user has to specify an attribute on the object that acts as the hash key as a GET arg?

Caching

I don't think this has to interface with cachew, that and any other major caching/processing should be handled in HPI

It would be nice if 'n most recent' sort/order_by results were cached, as one would probably be requesting the same route with different offsets/cursor -- Would be possible to save some sorted results in memory.

None of these feel like great solutions though - it's difficult to draw a line in the sand and reason about what should be done for data returned from arbitrary HPI functions.

seanbreckenridge avatar Feb 15 '21 11:02 seanbreckenridge