redwood
redwood copied to clipboard
Adds caching for services
There have always been two hard problems in computer science. After this PR, there is only one.
Yet another Rails-inspired feature! This one lets you easily cache the result of a function in your services.
The intended use is as an LRU cache: don't worry about manually expiring things, just put things in the cache as needed and the cache software itself will eventually evict the least used/oldest ones when memory fills up, or by whatever other metric it follows internally. What this means is there's never a need to manually delete something from the cache—leave old entries forever, who cares. It all depends on the key. Everything boils down to two statements:
- If it's already cached, give it to me
- If it's not cached, cache it
THAT IS ALL.
This cache only uses two operations: get
and set
. You either get some data back from the cache matching your key with get, or you save data to the cache using set. This is pretty close to the Memcached interface, but miles away from the 500 functions that Redis exposes. I'm trying to keep things ultra simple, and most of this functionality (I'm looking at you, Redis) falls under YAGNI.
Why
There's going to be lots of times where you have a query for a large dataset from the database that you want to avoid making if possible, but that doesn't change too frequently. Obviously if the data is changing every second, it may not be a great candidate for caching. Unless you're getting 1,000 requests per second for that data, then maybe it does!
But, cache can also be used to avoid large processing tasks, third party API calls, and lots more. Caching just takes the result of those tasks and sticks it in memory (memcached) or disk (redis) both of which are probably orders of magnitude faster to read from than the amount of time you spend doing the original task. So caching is great, but usually hard to implement well. But not for Redwood users!
Clients
This implementation includes two clients out of the box, Memcached and Redis. It's really easy to add your own implementation using a different library or a completely different service. You just create an adapter class that exposes a get()
and set()
function that does whatever it needs to do to save a Javascript object to your cache and retrieve it. The two included ones will JSON.stringify()
your resulting data and then JSON.parse()
it back out of the cache, so it needs to be able to survive that process intact (which means no caching of functions, for example).
Setup
There's a setup command (in progress) which creates a file api/src/lib/cache.js
. This follows the convention of setting up stuff like logging and auth. You setup your client and call createCache()
from which you destructure and export two functions:
import { createCache, MemcachedClient } from '@redwoodjs/api/cache'
const client = new MemcachedClient()
export const { cache, cacheFindMany } = createCache(client)
Usage
cache()
is the simplest form, where you just give it a key, a function to execute returning the data you want to cache, and an optional object of options (currently only expires
):
import { cache, cacheFindMany } from 'src/lib/cache'
import { db } from 'src/lib/db'
export const post = ({ id }) => {
return cache(`post-${id}`, () =>
db.post.findUnique({
where: { id },
})
)
}
// or to expire after 30 seconds
export const post = ({ id }) => {
return cache(`post-${id}`, () =>
db.post.findUnique({
where: { id },
}),
{ expires: 30 }
)
}
It's up to you to make sure that you key is unique enough to expire if the data in your object changes and you want to cache the new result instead. Following Rails conventions means that for DB data you include the updatedAt
timestamp in the key, since it will only change if some other data in the record changes. In the first example above the cache would never expire since the id
of the record is never going to change. It may get evicted at some point if it's not frequently accessed, however.
cacheFindMany()
is where things get interesting. This assumes you're executing a findMany()
Prisma query and want to cache the results of that query until something in the result set changes. It does this by creating a key that's a composite of the latest record's id
and updatedAt
time. If any one record in that result set changes, the key will change (because updatedAt
will be different), which means a new copy is stored. Using cacheMany()
requires that you have some unique id
and an updatedAt
field that is updated any time the data in the record changes. id
and updatedAt
are the default field names, but you can configure them in the createCache
call if your model has them named something different.
Let's say you wanted to cache the following function:
db.post.findMany({ where: { popular: true } })
You need to transform that function call into an object instead, then give it to cacheMany()
:
export const posts = async () => {
return await cacheMany(
'posts',
db.post,
{ conditions: { where: { popular: true } } },
)
}
You need kind of a new syntax here (a conditions
object containing the arguments you would normally send to findMany()
) because I need to be able to make a findFirst()
call based on the conditions you sent to findMany
, but only for a single record, sorted descending by updatedAt
time. So you can't make the findMany()
call like normal, because I wouldn't be able to pull it apart and get just the conditions you gave it.
Internally I'm doing this:
db.post.findFirst({
where: { popular: true },
orderBy: { updatedAt: 'desc' },
select: { id: true, updatedAt: true }
})
Which gets the absolute minimum amount of data needed to determine if a recordset has newer data than was last cached and then building the key posts-123-1653400043430
from the single record that's returned. If it doesn't exist in cache then it runs:
db.post.findMany({ where: { popular: true } })
Which is the original query that was intended to be run, caching the result.
Of course, you could do all of this yourself using cache()
but this is a nice abstraction that saves you the work. I wish you could just pass the standard db.post.findMany()
call like you do with cache()
, but alas that's an actual, executable function that returns results.
Side note: in ActiveRecord for Rails, calling the equivalent of
findMany
doesn't actually execute anything right away, it's just letting the instance know that you intend to call that at some point. You can pass that around and it doesn't actually execute until you need it for display, or give to a method that expects actual data. So the above can be written, in Rails, as:posts = Post.all cache(posts) do posts.map do |post| # ... end end
And it can automatically look for the newest record first before actually executing
Post.all
and getting all records from the DB if it turns out the cache is empty.
Caveats
Right now there's some interesting behavior if the memcached or redis service goes offline after the api side has already connected to it:
~Memcached: the next request for the cache hangs and never returns...no error is thrown, nothing. I've tried adding a timeout around the code that makes the request to the memcached client, but it doesn't seem to help. This appears to be a known issue: https://github.com/memcachier/memjs/issues/162~ Fixed with our own local timeout!
Redis: an error is raised as soon as the Redis server goes away, which appears to crash the api server completely 😬 It seems to happen outside of the caching code so I'm not sure how to catch this:
api | SocketClosedUnexpectedlyError: Socket closed unexpectedly
api | at Socket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:182:118)
api | at Object.onceWrapper (node:events:640:26)
api | at Socket.emit (node:events:520:28)
api | at TCP.<anonymous> (node:net:687:12)
api | at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
api | Emitted 'error' event on Commander instance at:
api | at RedisSocket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/index.js:350:14)
api | at RedisSocket.emit (node:events:520:28)
api | at RedisSocket._RedisSocket_onSocketError (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:205:10)
api | at Socket.<anonymous> (/Users/rob/Sites/service-caching/node_modules/@redis/client/dist/lib/client/socket.js:182:107)
api | at Object.onceWrapper (node:events:640:26)
api | [... lines matching original stack trace ...]
api | at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/auth?method=getToken to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
web | <e> [webpack-dev-server] [HPM] Error occurred while proxying request localhost:8910/graphql to http://[::1]:8911/ [ECONNREFUSED] (https://nodejs.org/api/errors.html#errors_common_system_errors)
Any assistance in mitigating these would be greatly appreciated!
Release Notes
Coming soon
☁️ Nx Cloud Report
We didn't find any information for the current pull request with the commit 111c84417ef8c4cddf894af37ad70cb1994a63b1. You might need to set the 'NX_BRANCH' environment variable in your CI pipeline.
Check the Nx Cloud Github Integration documentation for more information.
Sent with 💌 from NxCloud.
Deploy Preview for redwoodjs-docs failed.
Name | Link |
---|---|
Latest commit | 111c84417ef8c4cddf894af37ad70cb1994a63b1 |
Latest deploy log | https://app.netlify.com/sites/redwoodjs-docs/deploys/6307eee74970450008bc9875 |
This is awesome! Looking forward to this feature. One thing Laravel does that I always appreciated was to use certain client implementations under the hood automatically based on environment variables. For example, instead of creating and passing in a "MemcachedClient", you set a environment variable such as "CACHE_DRIVER=memcache".
The nice part with this is that no code changes are necessary, and it becomes trivial to use "InmemoryClient" when doing local development (CACHE_DRIVER=local) and redis in your staging and production environments. Have you thought about this approach? It can apply to more than caching too. Logging, database, queues (if supported), etc
Another small win is that developers can just use imports from "@redwood". There is no cognitive load of cache related files in their projects.
Cons are:
- More difficult configuration if customization is needed.
- Bundle might need to include all client implementations??? Not sure.
Just an idea! Keep up the awesome work.
- More difficult configuration if customization is needed.
- Bundle might need to include all client implementations??? Not sure.
Yeah, these are what I was worried about. If someone didn't like the two libs I'm using (memjs and redis) then it makes it more difficult to add your own... do you just have a single custom
ENV var setting, and then that somehow lets you do your own? What if you had separate implementations for different environments?
Personally, I would use the same client everywhere (memcached) for parity, including in development. Then your server location(s) are the only ENV var needed:
import { createCache, MemcachedClient } from '@redwoodjs/api/cache'
const client = new MemcachedClient(process.env.MEMCACHED_URL)
export const { cache, cacheLatest } = createCache(client)
@cannikin Did you consider a mechanism to set the cache key with something that includes session/user info?
That way a service thy queries “my posts” with a where clause on the author can be cached per user?
See: useResponseCache https://www.envelop.dev/plugins/use-response-cache#cache-based-on-sessionuser
Perhaps define a function that returns a sessionId and the developer can define that as needed?
Yeah, these are what I was worried about...
Makes sense! Abstracting the configuration behind env variables can always be a future enhancement if that is the direction it goes.
@cannikin Did you consider a mechanism to set the cache key with something that includes session/user info?
@dthyresson Right now the key is just a string so it's up to you to set that for uniquifying the content...context.currentUser.id
would work great!
Is there another kind of session identifier we have access to in services besides just context.currentUser
?
Hello ! It might be a stupid question, but at least I might learn something today.
I don't really understand how it's possible to have a cache in a serverless environment, could you maybe explain to me ? 😅
Hello ! It might be a stupid question, but at least I might learn something today.
I don't really understand how it's possible to have a cache in a serverless environment, could you maybe explain to me ? 😅
@LotuxPunk There are many ways to cache data -- some are "in memory" like an LRU (least recently used) cache and others involved persistent stores like MemCache or Redis.
While a "serverful" cashing solution can leverage both in-memory or persisted caches, "serverless" would need to use a separate persisted store to cache that data -- like Redis or MemCache.
This implementation supports those types at the moment, so if you deploy to a serverless provider, then you'd want to use a MemCache or Redis service to store your cache.
This will usually be faster to access that the database and reduce load -- given they don't invalidate often.
Note that the useResponseCache
GraphQL plugin works similarly to cache graphql requests and also supports a redis cache; see: https://www.envelop.dev/docs/guides/adding-a-graphql-response-cache#caching-query-operations
Hello ! It might be a stupid question, but at least I might learn something today.
I don't really understand how it's possible to have a cache in a serverless environment, could you maybe explain to me ? 😅
@LotuxPunk There are many ways to cache data -- some are "in memory" like an LRU (least recently used) cache and others involved persistent stores like MemCache or Redis.
While a "serverful" cashing solution can leverage both in-memory or persisted caches, "serverless" would need to use a separate persisted store to cache that data -- like Redis or MemCache.
This implementation supports those types at the moment, so if you deploy to a serverless provider, then you'd want to use a MemCache or Redis service to store your cache.
This will usually be faster to access that the database and reduce load -- given they don't invalidate often.
Note that the
useResponseCache
GraphQL plugin works similarly to cache graphql requests and also supports a redis cache; see: https://www.envelop.dev/docs/guides/adding-a-graphql-response-cache#caching-query-operations
Alright, make sense, thanks very much!
Can't wait to test it ! ✨
Does anyone dare to review this?!
My only reservations, and I'm not sure how we go about fixing them:
- If you're using Redis, and the Redis server goes away, the entire API server crashes. Does anyone know if it's possible to somehow catch this in the api server code and ignore?
- If you're using Memcached, and the Memcached server goes away, you're okay. But if the server comes back it won't reconnect. You'll just keep skipping the cache and going to the DB forever after, until the api server is restarted. I may have a potential fix for this, adding a
reconnect()
function to the client that can be called if a request times out.
Looking great @cannikin! All of the code looks good to me, and love the tests - just minor comments.
I'll take it for a spin tomorrow for the final check of both docs and usability with STRICT MODE :)
I think there's another piece missing (maybe) - how do you test a service if you're using the cache stuff?
I'll have to try it to see, but it might be helpful to expose some mocking/helper methods!
how do you test a service if you're using the cache stuff?
Well, right now if you're running your cache server then it'll get cached during the test run, which may or may not be what you want to happen. If the cache server is not running then it would show an error message, but go ahead and return the result of function you gave to the cache (as if it was running, but without any of the speed benefits). This will clutter up your test output with tons of warnings/errors though.
Do we properly set NODE_ENV
to test
when the test suite is running? We could just not even attempt to cache when in test mode. Another option is configure the cache to use the InMemoryStore
in api/src/lib/cache.js
when in test mode, which you can then inspect and see if it contains the data that you think it does (like I do in the cache test suite itself). That's what this person did with Rails.
Do we properly set NODE_ENV to test when the test suite is running?
Yes, I think we always have the environment set. We could also configure it in our jest setup (we already do something similar for AsyncStorage - which we use to isolate request contexts) https://github.com/redwoodjs/redwood/blob/main/packages/testing/config/jest/api/jest.setup.js#L223
We could just not even attempt to cache when in test mode.
Hmmm.... an option, but maybe for me, the inMemoryStore one feels more logical (and more realistic). Maybe we could expose a helper or two to check the in memory store too - if it makes sense.
Sure thing - I need to read up on it again too.
10 replays were recorded for 61a58d9e7bf79db7231a42ff130da6b16168405d.
0 Failed
10 Passed
- requireAuth graphql checks
- useAuth hook, auth redirects checks
- RBAC: Admin user should be able to delete contacts
- RBAC: Should not be able to delete contact as non-admin user
- Smoke test with dev server
- Smoke test with rw serve
- Loads Cell mocks when Cell is nested in another story
- Loads Cell Stories
- Loads MDX Stories
- Mocks current user, and updates UI while dev server is running
@cannikin good to go I think. Want to take one last look at the docs and merge?
🔔 @jtoar, @Tobbe—I couldn't cherry pick this one. If you want it in the next release, you'll have to cherry pick it manually.