armature icon indicating copy to clipboard operation
armature copied to clipboard

Iron out Session extensibility

Open jgaskins opened this issue 1 year ago • 2 comments

Sessions are currently difficult to extend to allow for different storage backends. I'm not sure yet what needs to be done to make that easier (or I would've done it), but it should definitely be easier. A couple things that I am sure I don't like:

  • I don't like Armature::Session::Store::Session as a constant path. Session being named twice just feels weird in a way I can't articulate.
  • The Session should not have to set specific instance variables and should ideally lean on methods for the Session::Store to call
  • The set of abstract defs for session implementations to define is almost certainly wrong
    • We should add #[]? to it because you may need to account for session keys that haven't been set yet
    • I'm not sure we should even have #[] at all, tbh — I haven't used it in years
    • Some of the methods defined in the Redis session implementation should probably be in the superclass
  • The Redis session's implementation storing an entire JSON object is probably unnecessary
    • It was originally implemented that way to batch updates to multiple session values (write once), but I'm doubtful that that's as useful as I thought it would be when I wrote it
    • We could just store each session value separately
      • The main upside here is that each session[key] = value call would immediately store that and we wouldn't have to worry about things like dirty tracking or any cleanup at all, really
      • The main downside is that different session values could expire at different times depending on the storage adapter — it would be fine with Redis since we could store all values for the same session in the same Redis key via HSET but a storage adapter like NATS KV would store each value into a different NATS KV key because it just stores strings and not more complex data structures.
    • If we stick with storing We're already using MessagePack for caching, so maybe that would've been a better idea for session storage since it's faster and uses less memory

The only concrete things I'm really concerned about are:

  • the Store is important to keep because it's used as the HTTP::Server middleware
  • the Store should be the only thing directly interacting with the backing store (Redis, NATS, Postgres, the filesystem, an in-memory hash, whatever it happens to be)
  • the Session should be simple
  • we shouldn't rely on sessions being loaded/saved for every request
    • if a wave of bots hammers your app, you don't want to be required to store sessions for all of them
    • if the request doesn't provide a cookie specifying the session and the session isn't modified in any way (for example, a form will store a csrf value), we don't store anything about the session

jgaskins avatar Mar 30 '24 16:03 jgaskins

Well, the first thing to consider is: how to deal with concurrency? Reading the session on the start of the request and writing it before sending the response will work as long as the handler is pure Crystal, but quickly gets messy if Crystal calls another service and multiple fibers gets the chance to run.

Is session locking something to consider, or is it just up to the user to deal with? Storing values individually would avoid the problem somewhat, but not totally.

Real atomic sessions probably requires rethinking session from "here's a bag of values" to a service of some kind.

But that's overkill for most real-world usage. Real-time update of session derived variables is only going to complicate most pages without any real gain. In practise, "this is the session data as it looked at the start of the request" is quite sufficient in all but the oddest of cases.

Which means the real problem is writing. Some data is fine just to overwrite, while other is not. A flash message bag comes to mind, if the user loads two pages at the same time, you don't want to have the message from one overwrite the other, even if you don't care which window actually displays them both.

I checked up on PHP to see if they did anything smart, but apparently the session handling code just locks the session for the duration of the request, per default. User code can then close the session early if they have no need for writing. And the redis extension has a long issue because it didn't support locking in the old days and people ran into all sorts of problems.

So I guess that's probably the best bet.

Regarding your wish for typed session objects, I had a thought (and it's probably not thought through):

Imagine Session::Handler and Session::Store as generics that takes a session class (anything json-able). And a Session::register macro that defines HTTP::Server::Context#session to be of a given session class and Armature::Request#session as a way to get at it from a route handler?

Session::Store (well, it's concrete subclasses) just loads and saves the session class, locks/unlocks it, and have a #gc for optional housecleaning. Session::Handler handles the cookie stuff, passes data from/to the Context and the store.

The session class could have an abstract superclass that adds #close, #save, #delete and what else user code might need (#regenerate?) which is forwarded to the handler.

xendk avatar May 13 '24 20:05 xendk

Cookie-based sessions, where the session payload is serialized entirely inside the cookie, have the same issue with concurrent session mutations. That sort of session storage is the default in Rails and it looks like it's what is still in use on GitHub.

Storing values individually would avoid the problem somewhat, but not totally.

Depending on the storage backend, we could store them together but query/update them separately.

I've experimented with storing them separately — using a NATS KV, the user_id and csrf token were stored in two separate keys. This meant I didn't need to batch query/update like the Redis store currently does, but then the expiration timestamps are different for keys in the same session. This can be useful if you're storing a short-lived value on that key (redirect_uri after a successful login, one-time password, flash message, etc) but also surprising in some awkward ways.

In Redis, at least, we have several options and I've been considering providing different session backends to allow someone to choose which scenario they want to optimize for:

the current "fetch everything, mutate, then store everything" approach

  • pros
    • easy to grok
    • you can see everything inside a session all at once
  • cons
    • checking any key in the session loads all the keys in the session, regardless of how many there are or how large the values are
    • high-concurrency use cases can lead to inconsistency — it's like developing via SFTP where the last save wins

store session keys in separate Redis keys

  • pros
    • each key is updated without stepping on other keys in the same session
    • you don't have to load every session key on every request
    • session keys expire independently of each other
  • cons
    • session keys expire independently of each other

store data in Redis hashes

  • pros
    • stored together, but queryable separately
    • high-concurrency-friendly
  • cons
    • you need more than a String => String mapping in session data

RedisJSON-based adapter

  • pros
    • lets you store arbitrary objects and query them even when nested (like session["a/b-test.experiments.foo"]? for A/B testing)
  • cons
    • RedisJSON is not supported by the baseline Redis server so not all providers will support it, but there are still some options
    • JSON traversal inside the Redis server puts additional load on it, which can be problematic on a single-threaded server
      • this reduces the throughput ceiling for session fetching on a single-threaded, non-clustered implementation
      • in my benchmarks, fetching a UUID user_id property in a 1M-query pipeline took ~3x as long with RedisJSON as it did when stored in a hash

Another thing to consider is that differences in capability between implementations may require differences in APIs between them. For most cases, it's probably fine because the Crystal type checker can automatically downcast variables from abstract types to concrete types if there's only one concrete subclass and there isn't really benefit to loading multiple session storage implementations. I saw this in my NATS KV-based session store experiment, which only mapped strings to strings, where session["user_id"]? returned a String? at compile time — I didn't have to unwrap a JSON::Any. However, for testing, it may be a good idea to have an in-memory implementation for performance and to avoid throwing a bunch of test data into a persistent data store, so the APIs there do need to match.

jgaskins avatar May 15 '24 19:05 jgaskins