operations
operations copied to clipboard
There's still a need to bump the memcache size
Hello. In https://github.com/openstreetmap/openstreetmap-website/issues/2457 I was told to open an issue here. But as it is getting a little over my head, I will just leave this here.
There is no evidence at all in the graphs that this in fact an issue. I definitely see the issue that you are referring to but I am unable it as all evidence says it shouldn't be down to memcache.
Could these sessions disconnections be caused by server restarts or does the server never restart?
I don't see any server restart in the stats, at least for the last 6 months: https://prometheus.openstreetmap.org/d/l4zgNUdMz/memcached?orgId=1&refresh=1m&from=now-6M&to=now
Also, the OP didn't provide any details how frequently they have to log in again. There might be external factors, like cookies being removed by the browser or some browser extension, etc.
I thought everybody else also has to login again at least once every three or four days. Maybe it's because I use various browsers on various devices. But why on the same device do I need to login again after three or four days? Anyways welcome to check the logs to see why user jidanni has to login again so often.
Which stat do you use to check if the server restarted?
Aren't these sudden drops in memory usage symptoms of a server restart?
Note that the dates are in the format month/day.
Ah, the link wasn't that helpful. There are about 11 memcached instances overall. However, for the 3 frontend servers, only 3 memcached instances (spike-06 ... spike-08) are relevant. Items in cache and memory usage are fairly stable for these three.
https://prometheus.openstreetmap.org/d/l4zgNUdMz/memcached?orgId=1&refresh=1m&from=now-6M&to=now&var-instance=spike-06&var-instance=spike-07&var-instance=spike-08
I think this should match the following config in chef: https://github.com/openstreetmap/chef/blob/45dc24b65b23a6c1dcc2f0ba2aa971563555c35e/roles/web.rb#L20
A restart would indeed lose all sessions but as @mmd-osm says it's only those three machines that we're talking about here and they last restarted in November last year:
At that time it took nearly two months for the caches to fill up which suggests that it should take about that long for things to get expired unless there has been a significant increase in the cache usage since.
The eviction rate has increased since November but it hasn't consisntently bee more than double. commands/second has remaind the same
I logged back in 5 days ago: 1 day later my session was still active but today I'm logged out. We can also see a dip today from ~100 millions items in cache to ~66 millions.
I suggest to store the sessions in the DB and use memcache only to speed up sessions check for frequently used sessions.
One of the machines was rebooted yesterday while fighting the DDOS so 1/3 of the the cache entries were lost.
I'm wondering how many of these entries originate from CGImap (key prefix would be "cgimap:"). For some reason, these entries have the expiration value set to 0 (unlimited). This doesn't make a whole lot of sense for rate limiting requests, where the exact timestamp would be known upfront at which time these entries become irrelevant.
At least when testing locally, I've noticed that every anonymous user creates a rails session without expiry (that's the "0" in "1 0 73" below), whereas logged in users have an entry with 4-5 weeks expiration.
Anonymous user sessions:
/usr/share/memcached/scripts/memcached-tool localhost:11211 dump
Dumping memcache contents
add rails:session:2::2d28d018bdda81f05bae57ba42ee200a7a14af6df74134bb93ee82f99bf7baab 1 0 73
{I"_csrf_token:EFI"096xa2ms9DVncEF7CBUeBJ0wP9VYJrKO6lzxqDomep74;F
Logged in user:
Expires at 1723288155 = Sat Aug 10 13:09:15 CEST 2024
add rails:session:2::2d28d018bdda81f05bae57ba42ee200a7a14af6df74134bb93ee82f99bf7baab 1 1723288155 200
{ I"_csrf_token:EFI"096xa2ms9DVncEF7CBUeBJ0wP9VYJrKO6lzxqDomep74;FI" user;FiI"fingerprint;FI"E....
Expiry shouldn't really matter that much because anything that isn't used just moves down the LRU list and gets discarded eventually when we need space for a new entry.
Logged in sessions (with "remember me" checked) do get an expiry of 28 days which matches the cookie expiry while other sessions (not logged in and logged in without "remember me" checked) actually don't have an expiry but issue a session cookie that expires when the browser is closed.
First of all, I find it a bit difficult to reason about the logged in sessions based on Prometheus stats, in particular after how many days these entries would be discarded.
memcached has an LRU crawler which reclaims expired entries even before they're reaching the end of the LRU list. With a non zero TTL, we might get rid of many "non-logged in user" entries early on, before they might evict "logged in user" entries.
At the current growth rate, we will likely see some evictions in about 10 days (=21 days after last memcached restart).
@jidanni : did you notice any issues with lost login sessions in the last 8-9 days? If so, it can’t be memcached related…
It's not that simple because only one machine was reset I think? So only keys which hash to that machine are currently exempt from being evicted.
I think spike-06..08 were all restarted, the aggregated cached items count on Prometheus shows 0 entries about 10 days ago.
@mmd-osm rather than using my misty memory, surely there must be some internal logs you can check regarding me (user: jidanni) that can give you precise details.
We want to hear from you first hand, as you’ve also raised the issue. Misty memory is ok. If you say it hasn’t bothered you recently then that’s good enough for now.
What we see in the charts right now is that no entries are being removed. So chances are that your session is still around.
Okay. I will remember next time to report each and every incident right here to the thread.
Okay. Just had to log in again as you can see in your logs perhaps.
Thank you for the feedback. This is not completely unexpected. Evicting entries started again on August 1st, even a bit sooner than estimated.
On a laptop I hadn't used in five days: Had to login again to OSM. But didn't need to login again to GitHub to add this comment.
At least 8 other users have reported the same issue in https://community.openstreetmap.org/t/osm-webseite-standiges-login-notig/120072
All different browsers, not only Firefox. I could also reproduce it today on my mobile.
spike-0[6-8] are seeing some cache evictions since a few days again:
Following up on my previous comment to get rid of anonymous sessions as early as possible, we could check how the Gitlab repo addressed the issue. They're having similar issues with Redis and unauthenticated users filling up the memory. Redis and Memcached implementations should be fairly similar, ['rack.session.options'][:expire_after] is also used by the memcached client.
Initially, Gitlab added a special helper for this purpose: https://gitlab.com/gitlab-org/gitlab/-/blob/ee088fc0d53198016e245c515f28e03d8229e297/app/controllers/application_controller.rb#L29 and some PRs on the topic: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/88514/diffs
Helper: https://gitlab.com/gitlab-org/gitlab/-/blob/ee088fc0d53198016e245c515f28e03d8229e297/app/helpers/sessions_helper.rb#L17-41
Lately they seem to have moved it to an own rack middleware to cover more scenarios: https://gitlab.com/gitlab-org/gitlab/-/commit/8c85364205ccb1f4602ab3543d10ff55295bd6cc
This might be worthwhile checking out.
I've adjusted the Gitlab code a bit to work with the osm website: https://github.com/mmd-osm/openstreetmap-website/tree/patch/sessionexpiry
It's more of a proof of concept at this time, to demo the idea. I can create a PR to continue the discussion, if needed. It should also not interfere with session_persistence.rb and session_methods.rb, which define a cookie expiration for logged on users only.
For testing, I recommend to check results of "memcached-tool localhost:11211 dump" after each activity, in particular the TTL value. That's second last value in each line starting with "add rails:session:2:..." (format: unix epoch).
/fyi: @AntonKhorev
Meanwhile, memcached has also been restarted or purged, so we're down to 0 evictions for the next few weeks.
Today had to login again.
Today had to login again too.
Also today Wed Jan 29 03:07:20 AM UTC 2025 had to login again too. Yes, "Remember me" checked last time as always I do.
I had to login again on 2025-02-26 and today (2025-02-29).
I had to login again on 2025-02-26 and today (2025-02-29).
Me too, today 2025/2/3.
Hey wait, your clock is a month ahead.
Anyway,
Feature request
Most valued users list
The Most valued users list would be composed of, let's say people who you don't want to log off often. Who? Backers who have donated more than a million dollars, heads of state, influential professors, etc., and me, your first test guinea pig.
Every effort shall be made never to log them off, except as a once a year security exercise with a notice on their screens (thus turning getting logged off into a happy experience (staff cares about my security.))
They will have their own private memcache pool or whatever behind the scenes to assure them that OSM is running smoothly and their donations were put to good use. Of course they would never hear the word memcache. All they know is the site is up.
If the program is a success it would be quietly expanded behind the scenes to eventually encompass all users.