cashay icon indicating copy to clipboard operation
cashay copied to clipboard

create cashay-server for realtime updates

Open mattkrick opened this issue 7 years ago • 2 comments

The problem is not everyone uses RethinkDB for reactivity & even if they do, they're more or less limited to single-table subscriptions. This makes sense, since it can get pretty expensive to simulate a join & subscribe to it. Apollo offers something that's in the experimental phase, but it's, uh, not robust. Here's the blueprints for how to make something that can match (or exceed) rethinkdb performance while allowing for cross-table subs.

Problem:

  • client calls the getTop5Posts(userId: 'user123') query. This returns documents of type Post with ids A,B,C,D,E
  • a second client calls upvote(postId: 'F') mutation. How do we know who to send this update to? Some folks care about just that document. Other folks don't care about that document yet, but supposing that the upvote gives F more votes than E, we should replace E with F. Doing this naively results in n db hits, where n is the number of channels that include at least 1 post.

Solution:

  • we have a topic lookup table full of queries, which are full of mutations, which contain a factory function for what I call "bump functions".
const topicLookupTable = {
  getTop5Posts: {
    upvote: (minVotes, minVoteId) => (mutatedDoc) => {
      if (mutatedDoc.votes > minVotes) {
        r.table('Post').get(mutatedDoc.id).run().then(post => {
          return {
            removeDocId: minVoteId,
            newDocId: post,
            bumpFnVars: [post.votes, post.id]
          }
        });
      }
    }
  }
};
  • At the end of the resolve method, before returning the array of docs, that socketId does a few things:
  • see if the channel getTop5Posts/user123 exists. if not, create the bump function for it: topicLookupTable[query][mutation](5, 'F'). Store this bump function on the channel getTop5/user123.
  • subscribes to getTop5Posts/user123

The magic of the bump function is that it contains really inexpensive logic (in this case, mutatedDoc.votes > minVotes). Without it, we'd have to re-run each original database function to determine if F replaced E. This is critical because every time upvote gets called, we're gonna have to run through every channel with the getTop5Posts topic. A single Float64 comparison should be cheap enough that JS will work at scale. SocketCluster already contains a message bus, but to save a function on each channel, we'll have to use a key/value store like redis to save the bumpFnVars on each channel.

For the next example, let's try a form of CmRDT. Say we have hell world and we want to correct it. We send: updateContent(changes: {id: 'A', pos: 4, val: 'o'}) to make it hello world. Since it's a C_m_RDT, We'll never have the full state, rather just a transform. That means our mutation will have to adjust the db with just this info. Then, we forward the operational transform onto the client & trust that the client knows how to do it. Since the updateContent mutation can never change the docs that are returned by getTop5Posts, our bumpFn is easy:

(idArr) => (mutatedDoc) => {
  if (idArr.includes(mutatedDoc.id)) {
    return {
      transform: mutatedDoc
    }
  }
} 

For super fine grained performance tweaking, we could consider establishing a discrete channel just for that field: content/content123, but that would be very application specific & could result in a performance net-loss.

A fringe benefit of all of these things is that it means we don't always necessarily need to use a websocket between the client and the server. For example, I can take the return values of the bump functions and store them away in a key/value store under the JWT. Then, when the client long-polls for updates, I just send the array of changes. That means in 1 network request, they get a whole bunch of fresh new info without having to request it from each individual query.

mattkrick avatar Dec 18 '16 04:12 mattkrick

additional thought: suppose each query can take in 2 additional args:

  • ids: The list of IDs that we currently have on the client
  • lastUpdatedAt: The max of all updatedAt in the list of IDs

With these 2 things, we can greatly reduce the network payload. For example, I subscribe to team members. Then i unsubscribe, then I subscribe again like: teamMembers(teamId: 'team123', ids: ['A', 'B', 'C'], updatedAt: Yesterday) Now, I run the query. When it resolves from the DB, I get something like this:

const teamMembers = [
  {
    id: 'A',
    updatedAt: 'last week',
    name: 'matt'
  },
  {
    id: 'B',
    updatedAt: 'today',
    name: 'jordan'
  },
  {
    id: 'D',
    updatedAt: 'last week'
  }
]

First, we intersect the result with the ids. On the left side, we have D. On the right side, we have C. In the intersection, we have A,B. WIthin that intersection, we see that A hasn't been updated for a week, so we exclude it. B has been updated since we have recently seen it, so we need to include it. So, we return a result like:

return {
  removeDocId: 'C',
  addDoc: {
    id: 'D',
    updatedAt: 'last week'
  },
  updateDoc: {
    id: 'B',
    updatedAt: 'today',
    name: 'jordan'
  },
};

Now, let's assume we cache this locally & then they refresh the page. The server doesn't even need to reply!

mattkrick avatar Dec 18 '16 05:12 mattkrick

This looks freakin awesome. I love the diffs.

dustinfarris avatar Jan 12 '17 05:01 dustinfarris