workerd
workerd copied to clipboard
ActorCache Read Coalesce
I turned the 2nd commit into a fixup because I resolved the TODOs I had and the rest of it was very small.
Will get to the other comments that I haven't replied to tomorrow, I just had a bunch of code I had git stashed
that I wanted to try and get in first, but decided to deal with the separate read promise chain instead.
Sorry I waited so long to look at this, I assumed there was still more work to be done. And TBH I'm still not sure if this was waiting on me or if you were planning on doing more.
So to be clearer, what is the actual state of this and what can I do to help right now?
FYI @a-robinson I'm going to push up some new work as a third commit today, with the eventual goal of squashing the commits into 1 before we finally merge this.
Just want to keep this one separate for now since it's giving me some trouble... This third commit focuses on having reads intercept exceptions from RPC calls, and then either propagating them to the GetWaiters
, or allowing the exception trigger a retry (via the normal flush retry mechanism).
This fixes the test "ActorCache read hard fail"
test, but this test only checks that single key gets() receive the exception. I've added another (currently broken) test that extends "ActorCache read hard fail"
to see if we get the same behavior when we have multiple keys awaiting a response from storage. I'm not sure if the mock framework actually lets us test this or not though.
It seems this change has also broken 9 other tests as well, I haven't had time to look into why just yet because: tl;dr other tests were broken due to a very subtle issue with my initial implementation and I was chasing that bug the last day and a half :grimacing:. Anyways, I suspect these are all breaking for one or 2 reasons that shouldn't be too hard to clear up.
Edit: Ok it turns out it was very simple, we've changed the size of Entry
and the tests hardcoded the expected value, so we started evicting when we weren't before. My 1 test is still broken but I'll push up what I have once I fully understand the new Entry
size (I'd expect it to be 136 but it's 142 bytes).
The latest commit https://github.com/cloudflare/workerd/pull/1916/commits/00ba98560d35f0f7fe216b989ec17f4d07eec522 uncomments the "getMultiple hard fail" test and attempts to address the reason for the test failure, but for some reason the exception seems to be causing more problems in this case than in the single-key get() test.
We do manage to propagate it to the caller (the test in this case), but the exception is getting logged and the test is failing.
I thought it might have to do with how we are catching the exception over RPC, or maybe that we give copies to more than 1 promise fulfiller (which ends up in a single joined promise)?
https://github.com/cloudflare/workerd/compare/00ba98560d35f0f7fe216b989ec17f4d07eec522..7f5b18fac3bab8fc6901ba1345a82b4ad0df709d is a rebase near or on main
since this work might be paused for a bit.