Only use cookies for context
Frankly, I don't fully remember why we had the context store in Rails.cache - There are random uids and they point to data in Redis. We store the uid in the cookie and reference the data. But that data is only a "1" meaning it should stick to master. I assume we had bigger plans for it have more info, but it never happened.
Given that's all it is, it would be smart to be able to encode the "stick to master" bit in the Rails session or it's own cookie. Maybe a hash in case we need some other bit. This would remove the need for the backend store and be generally more manageable.
@bleonard What about API calls that do not use cookies?? https://github.com/taskrabbit/makara/issues/175
The old method requires cookies, too. It has to store the "session context" guid somewhere. One way or another, that contet has to be saved. In our iOS app (using AFNetworking), for example, there is a way to actually send the right cookies headers back and forth.
Hey @bleonard! This week I've been working on a CookieStore for Makara, since right now we are using it without stickiness because we don't set any kind of backend cache (we just use the NoopStore) and wanted to use cookies for this. While doing that, I realised that I didn't quite understand what the real intended behaviour for stickiness was, as it's implemented now.
Without looking much as the code, I understood that once you stick to master, you want that all operations happening within the set master_ttl for that config go to master as well. That's why you set that ttl larger than the replication lag you can tolerate. However, once I was deep in the code, what I saw is that only the first request after sticking (or the first two requests if the first one is a redirect) goes to the master as well. Let me explain:
These are the relevant lines, when deciding whether to use the master, and whether to stick or not for the next request https://github.com/taskrabbit/makara/blob/e95d4303d350c9ab9dfc41b1e5c2743a1849b9c4/lib/makara/proxy.rb#L201-L214
It's easier to see with an example. Suppose we start without a context, that is, without a cookie set or anything, and make 3 requests that take less than the TTL we have configured for sticking to the master. In each request we keep track of the previous context and the current context, and after the request we set the cookie with the current context.
Request 1, this one needs the master because it writes:
cookies = {}
previous_context = "P1" # Generated, since the cookie is not set
current_context = "C1" # Generated. If the previous request was a redirect, we'd set it to P1.
cache = {}
appropriate_pool = primary # Since it's a write operation
# Stick to master
cache["C1"] = true
# Response cookies: _mkra_ctxt=C1--201
Request 2, that doesn't need the primary because it only reads:
cookies = { "_mkra_ctxt" => "C1--201" }
previous_context = "C1" # Read from the cookie
current_context = "C2" # Generated, as the previous request was a 201
cache = {"C1" => true}
appropriate_pool = primary # It's not a write operation, but we are stuck to master: cache.read(previous_context) = true
# In this case we don't write to cache ⚠️
# Response cookies: _mkra_ctxt=C2--200
Request 3, that doesn't need the primary because it only reads, and it's still within the master TTL:
cookies = { "_mkra_ctxt" => "C2--200" }
previous_context = "C2" # Read from the cookie
current_context = "C3" # Generated, as the previous request was a 200
cache = {"C1" => true} # This hasn't expired yet.
appropriate_pool = replica # ⚠️ This is not a write operation, and cache.read(previous_context) = false, since we didn't write it before
Did I get this wrong? This means that we wouldn't be sticking to master for the duration of master TTL, only for the first request done during that TTL after sticking, or the first two if the write operation happens to redirect, which is usual in typical actions in Rails apps. I'm not sure if this was the initial idea, to stick only for one request, up to the TTL.
Still, even if that was the initial idea, there is another scenario in which this doesn't work: if you have several configurations at once (like having both Redis and MySQL using Makara). Cache keys account for this possibility, but _mkra_ctxt cookie is only one and you can only remember one context at a time. This means that the "stuck" context might get overwritten with a different context from a different config and server before you have the chance to "stick" a subsequent read request.
I hope I explained myself properly, sorry this was quite long! 😅
I decided to comment here in this issue because what you describe is the solution I had gone for, 2 days ago. Then I realised that if you have several configs at once, as it's our particular case, we can't rely only on the presence, as that'll stick all configs. Then, we'd need to ignore completely the context from the cookie, and use only the config_id, extracting it from the cache key. Then I just realised I'm not sure why you'd need the context at all, if the desired behaviour is to stick to the master for all requests received for the duration of master_ttl. All you'd need is to track whether a specific config_id is currently stuck or not. If you had only one config, then all you need is the presence of the data, as you said, to indicate that is stuck. If you have several configs, then you need to know what the current config is, and whether is stuck or not (you can do this with a cookie that stores config ids and timestamps for when they started being stuck), but you don't need to differentiate between contexts for the same config.
I'll be really happy to submit a pull request with my idea, using only cookies, and not tracking context, just config id, but first I wanted to check whether I was missing anything about the stickiness behaviour based on what I described above, and anything else you might have in mind!
That was a fairly impressive opening comment :-) Thank you.
If I'm reading it right, there are few things in there.
- Would it be better to stick to master for whole TTL even if that's a few requests?
- Should makara have multiple contexts or is one enough?
Number 1 seems right to me. We're not doing too many redirects and it's mostly API calls so have no obviously run into that before. In all cases, that seems more intuitive to me.
For number 2, I'd love to understand your config better and how Redis and makara are working. It seems possible that preserving the ability to have two contexts could be smart. It's not always the same data being written to each so Redis and MySQL could have different invalidation patterns.
This would influence the cookie storage. Maybe the cookie name has the ability to have a key, otherwise it defaults to having a single cookie. So, it could be "_mkra_ctxt_mysql" or something. Or we keep _mkra_ctxt and it has a hash with the keys and their status.
If you consider Number 1 while using cookies, the whole system actually gets much simpler because cookies already have TTLs. We could update the TTL when a write happens. Of course, maybe that's why you just want 1 context. Having both contexts in the same cookie actually moves TTL management to a timestamp in the cookie. So maybe the (optional) cookie naming ("_mkra_mysql") is the way to go. If the config has a cookie_key or something to that effect, load it from there, otherwise use the default ("_mkra_default") (probably a new default so we don't have parsing issues with the old _mkra_ctxt cookie).
What you store in the cookie is an in interesting question. It's tempting to store just a "1" or something, but we also want to think about what would happen if we want to store something else later. Does that mean do a hash now? Maybe or maybe not. I suppose would could still add some crazy delimiter later if there was new data: 1|futureproof? What you think?
If we do get this cookie thing going, I can't think of a decent reason to continue with the Redis context storage at all. It already needs cookies to make it work, so there's no win there. It's just a waste of network and storage. Of course, I didn't make that so maybe there's something brilliant I can't see. I have the feeling it just never expanded as predicted (as noted above).
Thanks for your reply and your notes @bleonard! You are correct, I was talking about two different things in my comment, and I didn't explain myself very well for the second! Number 1 is right, the behaviour of sticking to the master for all requests happening during master_ttl was what we personally understood from the README. The problem of sticking only for one request or two is that you might not get the behaviour you want if there are other requests happening on a page load (XHR requests to load other pieces separately), and they might be equally affected by replication lag.
If you consider Number 1 while using cookies, the whole system actually gets much simpler because cookies already have TTLs. We could update the TTL when a write happens.
Yes! This was what I had implemented, very simple, until I realised this wouldn't work if you had more than one config around! In our case it was MySQL + Redis, but it could have been two different databases, or two or more different Redis instances.
For number 2, I'd love to understand your config better and how Redis and makara are working. It seems possible that preserving the ability to have two contexts could be smart. It's not always the same data being written to each so Redis and MySQL could have different invalidation patterns.
Sorry! I didn't explain number 2 very well. We use Redis together with MySQL in a web context (we also use Redis and MySQL in the context of ActionCable, but that's not relevant in this case, as we wouldn't be using the middleware and reading and setting the cookies). This means that we might read/write from MySQL and Redis in the same request. A write from MySQL would result in setting the sticking state to true in our new cookie, and that would affect Redis as well. This is what I mean by "if you have several configs at once, as it's our particular case, we can't rely only on the presence, as that'll stick all configs". That's why the simple solution of relying on cookie presence and cookie max_age didn't work 😭
Notice that when using the backend cache this does work because the config id is part of the cache key. Moreover, if we stick to master for the whole duration of master_ttl, that's the only part of the key that matters! That's why I was very curious about needing the context at all, I don't think it's necessary if you stick for the whole master_ttl.
I didn't make that so maybe there's something brilliant I can't see. I have the feeling it just never expanded as predicted
Ah, yes! That sounds like a very plausible explanation.
What you store in the cookie is an in interesting question. It's tempting to store just a "1" or something, but we also want to think about what would happen if we want to store something else later. Does that mean do a hash now? Maybe or maybe not. I suppose would could still add some crazy delimiter later if there was new data: 1|futureproof? What you think?
Right! I've been thinking about what we'd need to store in the cookie so that this works with several configurations. I considered two alternatives:
- Add the config id to the cookie name. As you said, something like
"_mkra_ctxt_#{config_id}"or"_mkra_#{cookie_key}"where thecookie_keyis specified in the config. I didn't like this too much because parsing the cookie before starting to process the request in the middleware became too complex, since you have to parse all possible cookies, so you'd need to keep all these config ids somewhere. They could be_mkra_ctxt_mysqland_mkra_ctxt_redis, but that wouldn't allow for much flexibility in the future. - Store the config ids and a value in a hash, in a single cookie. This is what I wanted to try, and I didn't think you need keep anything else apart from the config id. That is, no
previous_contextorcurrent_contextanymore. About the value,1vs. something else, my idea was to store the timestamp until which the setting was valid. The reason is that If we store a hash in a single cookie, we can no longer rely onmax_ageto handle the expiration time for us. Each value can have potentially different expiration times.
Let me show how I thought this could work, with an example. Imagine we have a database.yml config for our main database, with id: mysql_main and master_ttl: 2, and another shared.yml config for the Redis instance we use in web, with id: redis_shared and master_ttl: 1. Imagine also that we get similar requests to what I used in my previous example: request 1 writes to MySQL and reads from Redis, request 2 reads from both MySQL and Redis, and request 3 reads from MySQL and writes to Redis.
Current time: 2018-02-05 07:51:09 UTC (1517817069.483005)
Request 1 (2018-02-05 07:51:09 UTC (1517817069.483005):
cookies = {}
(mysql) appropriate_pool = primary # Since it's a write operation
(redis) appropriate_pool = replica # Since it's a read operation
# Stick to master in mysql
# Response cookies: _mkra_ctxt=mysql_main:1517817071.483005
The value for mysql_main is basically current timestamp + 2 (master_ttl). That's when the stickiness will expire.
Request 2 (2018-02-05 07:51:10 UTC (1517817070.000000)
cookies = { _mkra_ctxt=mysql_main:1517817071.483005 }
(mysql) appropriate_pool = primary # value from mysql_main in the cookie is > than current time => We stick
(redis) appropriate_pool = replica # it's a read operation and we don't have the config id in the cookie
# Response cookies: {}
Request 3 (2018-02-05 07:51:11 UTC (1517817071.000000)
cookies = { _mkra_ctxt=mysql_main:1517817071.483005 }
(mysql) appropriate_pool = primary # value from mysql_main in the cookie is > than current time => We stick
(redis) appropriate_pool = primary # it's a write operation
# We must stick to master in Redis
# Response cookies: _mkra_ctxt=mysql_main:1517817071.483005|redis_shared:1517817072.000000
We keep mysql_main in the response cookies because it hasn't expired yet, and we add the new value for redis_shared, current time + 1 second (its master_ttl).
When we check stickiness and one of the keys in the cookie has expired, we can get rid of it. For the response cookies max_age we need to set one that allows all the timestamps stored to "survive".
What do you think?
Oh! A concern I had with the above solution was the possibility of exceeding the cookie size limit for browsers (assuming only modern browsers that support something like 4096 bytes per cookie, and > 20 cookies per domain, not 4096 bytes in total per domain). But if we used MD5 hashes in hex for config ids by default, and store the timestamp with the above precision, we'd have something like this per config: 0aa1ea9a5a04b78d4581dd6d17742627:1517818126.148078. That is 50 bytes, then one byte more for the separator "|". 80 keys and values like that would take 4080 bytes. I think this allows for plenty of room.
That's great @rosa - I'd love to see a PR and talk about more specific spots. It looks like you have something going already :-) so let's go from there. Very exciting.
I personally wouldn't worry too much about the config ids - can probably use the ones from the config file with a gsub(/weird stuff/, "_") or something. Of course maybe you have more experience with crazy configs :-)
It would be nice if they were readable in the cookie and likely be shorter than a MD5.
Thanks @bleonard! 😄
Of course maybe you have more experience with crazy configs :-)
Haha! Not really, we weren't defining any ids ourselves, just relying on the internal MD5 hash that is set by default https://github.com/taskrabbit/makara/blob/c6b158e847468379da8cc2eb69aee455bef386fb/lib/makara/config_parser.rb#L151-L156
It would be nice if they were readable in the cookie and likely be shorter than a MD5.
Oh, yes. Hmm... Perhaps we can truncate the MD5 and get a substring of size 16 (that'd be using 64 bits instead of 128), I think the probability of collisions is still super low, specially for this case in which we are talking about 2 or 3 different ids in most cases with different configurations. For specified ids with strange characters that wouldn't be readable in the cookie, I imagine we can use the MD5 of the id itself? In that way we'll always have the same format for all identifiers, but it'd lose the disadvantage of having an id easy to identify in the cookie, for Makara users that set their own ids. Maybe it's better to just use whatever id is set in the config in that case, and let users set reasonable ids?
Yes, that sounds good. I would allow them to set that one and use that.
If we use the above as a default, though, it's pretty still that it would change every deploy. This would have the effect of not correctly sticking to master across a deploy.
So we could have this method use the passed in one || 'default'
If they want two contexts (least common case) they can set ids. Or don't set any ids (or set them to the same) to share a highwater mark TTL in the same context like you want to do.
This does make me think that the cookie management code has to clean up both the "expired" things in it as well as "unknown" config ids.
Why would it change in every deploy? That is, assuming that the deploy doesn't include a change to database.yml, or redis.yml, or any other config that is using Makara? Not sure if I'm missing something that the ConfigParser does here.
I think that unknown config ids in the cookie might be fine as long as they expire in a short time 🤔 Well, all depends on the values set by the developers writing Makara config (that is, master_ttl).
Sorry, not every deploy. I read it wrong. But when the a config changes. I still think default is a nice idea and gives the expected behavior without a hash looking thing and wouldn't change even when adding a new read-only server.
Yes, unknown ones would likely be fine as long as the code looks at all keys and their expire times regardless of being known or unknown configs. That was just a note to write it that way. It's just as likely that we were to write it only looking at the known ones.
Ahh, got it, thanks!
I'll try to put together a pull request with this, hopefully I'll have time this week! 😄🤞
release candidate: gem 'makara', git: 'https://github.com/taskrabbit/makara.git', tag:'v0.4.0'
referencing #208 here.