Radicale Radicale causing absurdly high CPU load (even making other applications fail)

Hello everybody,

we have installed radicale on Windows Server 2019. There is a fairly small number of Thunderbird clients that access the CalDAV server (about 10 clients at maximum). When installing, we have followed the installation instructions literally, and we have studied the documentation thoroughly.

From the beginning on, we have noticed that radicale caused a high CPU load. However, we have ignored that problem although the load average has slowly increased over time. But now it has reached a new level. It is practically always above 20%, is around 60% during a substantial part of the time, and even goes up to 100% sometimes. This makes other applications on that server fail.

I would like to ask what could cause that problem, and would like to ask if I can profile it somehow.

I'd like to emphasize that I have no clue of python and understand radicale only to the level that allows me to configure and use it. However, I know programming in general, so I should be able to follow non-trivial instructions for debugging and profiling if they are give in understandable language without Python-specific terminology that is only known by Python specialists.

What I have already sorted out:

RAM is not a problem (32 GB RAM, radicale taking usually only a minimal amount of it, some GB free always, not relationship between RAM usage and CPU load by radicale).
CPU power is not a problem (4 Cores at 2.6 GHz, CPU load (without radicale) always under 5% except the usual very short spikes when an application starts or does something special).
This is a VM, though, if it matters.
Storage speed is not an issue. The VM is backed by fast SSDs.
Storage size is not an issue. radicale and its data reside on the system partition with 700 GB free space.
Storage load is not an issue. The storage load caused by other applications is negligible.
Our installation until today's morning was Radicale 3.1.x driven by Python 3.10 embedded. However, I today I deleted everything except the user configuration, the general config file and the collections, and then installed the full Python version 3.14.1 using the new Python installation manager for Windows, and then installed the newest radicale release via python -m pip install --upgrade radicale.

This did not change the situation.

As mentioned above, we have followed the installation instructions literally. This means that we used nssm to start radicale as a service. To make sure that the problem is not caused by nssm, I have stopped the service and have started radicale on the command line in the foreground.

This did not change the situation.

To make sure that no virus scanner causes the problem, I have added the radicale data directory to the folder exclusion list in Windows Defender, and then even have disabled the real-time protection of Windows Defender.

This did not change the situation.

There are no other virus scanners or protection software running on the server in question.

In the config file, I have set use_mtime_and_size_for_item_cache = True as advised elsewhere.

This did not change the situation.

(To be honest, I wouldn't have expected that this measure effects anything, because the four CPU cores get the giggles when they are asked to compute an SHA256 checksum.)

Now I am really out of ideas. Can anybody help? I believe that I can't be the only one who encounters that problem. I even don't know whether the problem is caused by Python or by radicale.

The only suspicious thing I noticed is that radicale outputs plenty of error messages when users are connected, mostly about bad requests or invalid tokens (whatever that means). But I can't imagine that this causes the high CPU load.

Thank you very much in advance, and have a nice weekend!

Dec 05 '25 14:12 Binarus

Is this happening during idling or because of processing ongoing requests? In case of high-load, usually the processing times per request are increasing. Are you able to sort out which kind of requests taking longer? For Linux, there is a logwatch scriptlet in contrib (https://github.com/Kozea/Radicale/tree/master/contrib/logwatch) which generates some statistics...would it be possible to transfer the logs to a Linux system and feed them into logwatch for analyzing.

Generally there are two requests which can cause heavy load:

REPORT
PUT in case many items are in the collection (because of verification that no duplicate UID is used)

It can be also a locking issue, have you tried [storage] type=multifilesystem_nolock?

Regarding profiling, unfortunatly no code extension was made so far, but if you run in debug log you can potentially check by timestamp delta of subsequent log lines where time is consumed

Dec 05 '25 15:12 pbiering

Thank you very much for the fast reaction!

Transferring the logs to a Linux system to let the logwatch scriptlet run is absolutely possible. I'll do that of course if it helps with analyzing the problem. However, I have a question about it:

I did not configure a special kind of logging until now. Which options should I set in which way in order to make that scriptlet output as much useful information as possible (level, trace_on_debug, trace_filter, bad_put_request_content, backtrace_on_debug, request_header_on_debug, request_content_on_debug, response_content_on_debug, rights_rule_doesnt_match_on_debug, storage_cache_actions_on_debug)?

Regarding multifilesystem_nolock, I haven't tried that yet, because there's a warning in the documentation about it. The docs say that it only should be used if there is only one process, and I am unsure what this exactly means. How many processes does radicale spawn in the default configuration?

I must take into account that I have to conduct those tests on the production system. It is no problem if the service is interrupted a few times for half an hour due to me trying to analyze the problem. On the other hand, data being screwed up because of tests would be very bad.

Dec 05 '25 16:12 Binarus

Logging: start with "[logging] level = debug", others can be leftout for now as they are specific and more related to bug hunting.

Locking: "nolock" is safe unless more than one Radicale process is accessing same file system (e.g. clustered setup where collection is stored e.g. on NFS or GlusterFS)

Generally the request load would be interesting, this can be already analyzed with "[logging] level = info", potentially client sync frequency is simply too high and therefore blocking each other - so far there is no QoS implemented e.g. delaying same client for same request if already running but unfinished.

Dec 05 '25 16:12 pbiering

Thank you very much again!

I have to leave the office for now, but will set it up the suggested way in two hours or so, let it run and then hopefully can post the results later.

Thanks a lot also for the explanation regarding the locking. The _nolock option then is safe for us. The radicale data folder is not shared between multiple radicale instances. The instance in question actually is the only instance in the whole site.

Do I need to configure the logwatch scriptlet in any way? Of course, I'll read the comments in it, but if there's something special about it, it would be nice to know in advance.

Dec 05 '25 16:12 Binarus

Created a wiki page with instructions...logwatch is not even required to be installed, only Perl must be usable -> https://github.com/Kozea/Radicale/wiki/Server-Statistics

Dec 05 '25 17:12 pbiering

@pbiering

Thank you very much for preparing the instructions! They are very clear and easy to follow.

I apologize for not getting back on Friday, but it turned out that the VPN to the office had problems, so I couldn't collect the logs. I have done that now and have attached the output of the logwatch script. There are three attachments, because radicale by default seems to rotate the logfile once it becomes 1 MB, and this has happened two times within a relatively short time this morning. The first two attachments are from two 1 MB log files, the third attachment is smaller since I had to stop radicale at that point on the server because it was continuously eating substantial CPU power.

The attached output of the script is not altered in any way, except that I have anonymized the login names / user names (which in our case are the full names of the real persons with forename and surname separated by a dot). When anonymizing, I have taken care to not introduce any ambiguities (i.e., every login / user name has been converted to a unique abbreviation).

The output also shows plenty of various error messages. Perhaps it's these errors that cause the high CPU load. I guess they are caused by the clients, not the server. In either case, I surely can correct the configuration so that they don't occur any more, provided somebody gives me few hints :-)

Now I'll try multifilesystem_nolock as you have suggested, and report back the result.

1.log

2.log

3.log

Dec 08 '25 10:12 Binarus

So in the meantime I have tried multifilesystem_nolock. Unfortunately, this didn't change the situation.

Dec 08 '25 11:12 Binarus

Strange results...the sync-token issue need to be solved by on client side, looks like they got lost, check cache folder.

For

Bad PUT request on '/kg/b82ef0ae-7d5f-7f28-f1a9-a66d884526da/040000008200E00074C5B7101A82E008000000005D7DC5362DC0DB010000000000000000100000005281FB9C9E9E164393852EF76B486F2E.ics' (prepare): can't compare offset-naive and offset-aware datetimes

please enable log option bad_put_request_content = True to catch the content and send it obfuscated.

EOF occurred in violation of protocol (_ssl.c:1081) client 192.168.152.85 -> client terminates SSL session unexpected

Please try to apply changes from mentioned profiling PR and enable either per_request or per_request_method to get hints which part of the code is causing the load.

Dec 08 '25 11:12 pbiering

Thank you very much again!

In the meantime, I have deleted all .Radicale.cache folders in the hope that this would solve the problem with the invalid tokens. Of course, this didn't change anything, as you already have said. I now have asked a few users to unsubscribe from all calendars and to reconnect their clients afterwards. From what I have read, this should make Thunderbird create (or fetch) new tokens. I'll report back once the users have taken these actions.

Another idea for testing regarding the invalid tokens is to set max_sync_token_age to 632448000, for example; this would be approximately 20 years. I'll test this and report back.

I'll also look into your other suggestions and provide the logs here.

Dec 08 '25 16:12 Binarus

Also check that the sync interval is not that low...so e.g. at least 15 min.

Dec 08 '25 16:12 pbiering

In the meantime, I have conducted the test I have mentioned above: set max_sync_token_age to 632448000. Since then, I didn't get any more log entries regarding invalid sync tokens. But the CPU load did not change. Therefore, I believe that the invalid sync tokens are not the cause for the high CPU load.

Additionally, I have found three log files from today's afternoon which have been created with max_sync_token_age being at its default value, but which don't have entries about invalid sync tokens either. But the CPU load was high as always when these log files were written.

I'll now remove that setting so the default value becomes effective again. I believe that we should ignore the invalid sync tokens during further analysis since they don't seem to be the culprit.

I also didn't get any bad put requests since two hours or so (probably because quite a few people had already left), but the CPU load was still very high. Now I am asking myself if we should ignore the bad put requests, too, because they definitely are a problem, but probably are not the reason for the high CPU load.

But please correct me if I am wrong. I'll do everything to debug this. In every case, I'll keep bad_put_request_content = True until tomorrow and post some of those requests. I am quite sure that they will occur in the morning again as people begin to work and update their calendars.

And of course, tomorrow I plan to apply the patch, adjust the settings as instructed and post the result.

Your last remark arises a new question:

Most users have the synchronization interval set to the lowest possible value. In Thunderbird, that is one minute. Plus, many users (besides their own calendar) have connected the calendars of multiple other users in read mode. This may lead to many requests. On the other hand, the number of users is smaller than 10. In my naive point of view, that should lead to 100 requests per minute at maximum, which really should not impose a problem since such requests could be handled in fractions of second. The VM is 4 CPU Xeon cores at 2.6 GHz, not a Raspberry Pico.

Again, if I am wrong here, please correct me. How much time is Radicale expected to take per request?

Finally, I'd like to explain in short why most people choose the synchronization interval as low as possible. I guess it's psychology. Mostly user A tells user B: "I'll enter that meeting in my calendar". User B is connected to user's A calendar in read mode and thus does not need to enter the meeting in question into his own calendar.

And that's where psychology strikes: User B won't trust peace until he actually sees A's entry in his own workspace (that is, in A's calendar that he is connected to in read mode). To keep the time until this happens short, user B sets the synchronization interval as low as possible and asks user A to do the same. The mechanism works the same the other way around and with every other user who participates in this shared calendar system.

I will be quite hard to tell users to choose longer synchronization intervals. If the synchronization interval causes the high CPU load, I'll try to educate the users, but currently I believe that the VM in question should have enough resources to handle thousands of requests per minute.

But as mentioned above, I would be glad if somebody would elaborate on that and could give a few numbers for a normal mid-sized PC.

Best regards, and have a nice evening!

Dec 08 '25 18:12 Binarus

bad_put_request_content = True was seldom and I would keep it enabled to trace down and potentially fix an unexpected issue.

In Thunderbird, that is one minute

That is for sure a reason why it is overloading. From statistics:

# 1.log
**Response timings (counts, seconds) (D=<depth> R=<result>)**
Response           |     cnt |     min |     max |     avg |
------------------------------------------------------------
OPTIONS:R=200      |      61 |   0.001 |   0.002 |   0.001 |
PROPFIND:D=0:R=207 |      61 |   3.179 | 122.103 |  16.863 |
PROPFIND:D=0:R=401 |      22 |   0.002 |   0.004 |   0.002 |
REPORT:D=1:R=207   |     318 |   2.564 |  23.717 |   3.715 |
REPORT:D=1:R=403   |      14 |   5.083 |   9.938 |   3.789 |
------------------------------------------------------------

# 2.log
**Response timings (counts, seconds) (D=<depth> R=<result>)**
Response           |     cnt |     min |     max |     avg |
------------------------------------------------------------
OPTIONS:R=200      |      34 |   0.001 |   0.004 |   0.001 |
PROPFIND:D=0:R=207 |      34 |   3.214 |   3.457 |   2.936 |
PROPFIND:D=0:R=401 |      26 |   0.002 |   0.013 |   0.004 |
PUT:R=201          |       1 |   3.079 |   3.079 |   3.079 |
PUT:R=400          |       1 |   2.746 |   2.746 |   2.746 |
PUT:R=401          |       1 |   0.002 |   0.002 |   0.002 |
REPORT:D=1:R=207   |     419 |   3.440 |   5.411 |   3.119 |
REPORT:D=1:R=403   |      24 |   3.013 |   3.320 |   2.872 |
------------------------------------------------------------

# 3.log
**Response timings (counts, seconds) (D=<depth> R=<result>)**
Response           |     cnt |     min |     max |     avg |
------------------------------------------------------------
REPORT:D=1:R=207   |      62 |   2.878 |   5.702 |   3.187 |
------------------------------------------------------------

Again, if I am wrong here, please correct me. How much time is Radicale expected to take per request?

As you see, PROFIND and REPORT requests can take longer...max seen above 122 or 23 sec - and potentially start blocking each other.

Can you try to replicate the installation on a Linux system? Collection content can be simply taken over.

Let see also what the new profiling output will tell us. You can run either per_request with min duration filter to skip the fast ones or for a longer period per_request_method, it will show regulary (interval) and on shutdown profiling data per request method.

Dec 08 '25 19:12 pbiering

I am currently trying the current master version that includes the patch you have linked. But there are questions or problems:

First, since I don't know anything about Python, I was not sure how to use the new version. Since radicale runs in the context of a user that is named Radicale, I went into C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages and deleted the subfolder radicale from there. Then I downloaded the ZIP of the current radicale master from this site, extracted it, and moved the extracted radicale subfolder to C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages. I did not touch the subfolder C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale-3.5.9.dist-info.

Is this a correct approach?

Second, I have set profiling = per_request_method in the config file and have started the new radicale service. Now I am getting the following error messages in the log file:

[2025-12-09 10:06:09 +0100] [8128/Thread-16 (process_request_thread)] [ERROR] An exception occurred during REPORT request on '/dk/022c3850-3199-dd46-3697-bdf7de0fe0e6/': Another profiling tool is already active

Is this expected? I have noticed that about half of the REPORT requests is handled normally and the other half causes these errors. Eventually the error simply means that another request is still in flight and being profiled when the new request comes in, but there are also chances that I did something wrong.

This is my current configuration:

[server]
hosts = 0.0.0.0:5232
ssl = True
certificate = c:\daten\radicale\config\radicale.crt
key = c:\daten\radicale\config\radicale.key

[auth]
type = htpasswd
htpasswd_filename = c:\daten\radicale\config\users
htpasswd_encryption = bcrypt
delay = 5

[rights]
type = owner_write

[storage]
type = multifilesystem_nolock
filesystem_folder = \\?\c:\daten\radicale\data\collections

[logging]
level = info
bad_put_request_content = True
profiling = per_request_method

Dec 09 '25 09:12 Binarus

Now I have a log file that has been created with the settings shown above. The relevant part is at the end of this post (some empty lines removed).

On one hand, as far as I can see, the profiling says that REPORT requests have taken the CPU for about 16 seconds during a test of 268 seconds. On the other hand, in the other portions of the log file there are lines like the following:

[2025-12-09 10:34:39 +0100] [6948/Thread-43 (process_request_thread)] [INFO] REPORT response status for '/gm/f64475d6-a638-46c3-75df-1fbdb3df3e40/' with depth '1' in 2.962 seconds gzip 178 bytes: 207 Multi-Status

I have thoroughly examined the log file and noticed that the REPORT response always takes about 3 seconds. That is something I don't understand yet. What is the relationship between the figures from REPORT request profiling (about 0.35 s per request) and the REPORT response status lines in the log (about ten times slower)?

My gut feeling is that gzip may be the culprit. I have no clue about CalDAV, hence the silliy question: Can we turn gzip off somehow?

End of log file (profiling results):

[2025-12-09 10:02:51 +0100] [2992] [INFO] Stopping Radicale
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method DELETE after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method GET after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method HEAD after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method MKCALENDAR after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method MKCOL after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method MOVE after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method OPTIONS after 268 seconds and 25 requests:          2575 function calls in 0.012 seconds

   Ordered by: cumulative time
   List reduced from 6 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       25    0.000    0.000    0.007    0.000 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\app\options.py:29(do_OPTIONS)
       25    0.005    0.000    0.005    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       25    0.004    0.000    0.004    0.000 {built-in method builtins.dir}
       25    0.000    0.000    0.002    0.000 {method 'join' of 'str' objects}

[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method POST after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method PROPFIND after 268 seconds and 43 requests:          1378726 function calls (1303719 primitive calls) in 10.515 seconds

   Ordered by: cumulative time
   List reduced from 497 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    39/13    0.001    0.000    1.730    0.133 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\app\propfind.py:379(do_PROPFIND)
 9929/893    0.056    0.000    1.211    0.001 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\storage\multifilesystem\get.py:161(get_all)
    27/16    0.000    0.000    1.058    0.066 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\server.py:132(finish_request_locked)
     9903    0.896    0.000    0.898    0.000 {built-in method _pickle.load}

[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method PROPPATCH after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method PUT after 268 seconds: (no request seen so far)
[2025-12-09 10:02:51 +0100] [2992] [INFO] Profiling data per request method REPORT after 268 seconds and 46 requests:          2407413 function calls (2216041 primitive calls) in 15.667 seconds

   Ordered by: cumulative time
   List reduced from 502 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     41/8    0.018    0.000    4.142    0.518 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\app\report.py:149(xml_report)
     42/8    0.092    0.002    3.047    0.381 C:\Users\Radicale\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\radicale\storage\multifilesystem\sync.py:35(sync)
28921/26120    2.085    0.000    2.453    0.000 {built-in method _pickle.load}
43441/7374    5.479    0.000    1.923    0.000 {built-in method _io.open}

Dec 09 '25 09:12 Binarus

[2025-12-09 10:06:09 +0100] [8128/Thread-16 (process_request_thread)] [ERROR] An exception occurred during REPORT request on '/dk/022c3850-3199-dd46-3697-bdf7de0fe0e6/': Another profiling tool is already active

Hmm, potentially "per_request_method" is not that thread-safe as assumed :-(

Try switch to "per_request" with a limit of e.g. 5 seconds (minimum) to suppress fast requests.

My gut feeling is that gzip may be the culprit. I have no clue about CalDAV, hence the silliy question: Can we turn gzip off somehow?

Client told server that compressed result is supported, but yes, can be blocked by server.

Simply test by commenting following code part

File: radicale/app/init.py

                if "gzip" in accept_encoding:
                    zcomp = zlib.compressobj(wbits=16 + zlib.MAX_WBITS)
                    answer = zcomp.compress(answer) + zcomp.flush()
                    headers["Content-Encoding"] = "gzip"
                    content_encoding = "gzip"

BTW: one reason can be also that too many items are in a collection and radicale is struggling processing them all

I've enabled now "per_request" on my production system using GlusterFS (known slow, but redundant, not really under heavy load) and found following

REPORT example consuming most time in "sync"
PROPFIND example consuming most time in "xml_propfind" or in "sync"

Dec 09 '25 11:12 pbiering

Increasing sync token lifetime makes imho no sense for high frequency lookup of clients, one can consider decreasing to 7 days.

Still working on extending the profiling capabilities to catch the XML requests with long duration.

Dec 10 '25 08:12 pbiering

Thank you very much for working on further profiling options!

I just have turned off gzip for testing, following your instructions. That didn't change anything, so I restored the original code.

I have conducted a lot of other tests with various settings since yesterday, but to no avail (that is, the CPU load did not decrease with the configurations I have tested).

For unknown reasons, I haven't seen bad put requests since a few days, which means that they can't be cause for the high CPU load.

Now I am thinking about turning off SSL for testing. Unfortunately, this would mean that users would have to reconfigure their calendars, which is not that easy in Thunderbird (in TB you have to unsubscribe from a calendar and configure it from scratch afterwards, because you can't simply edit the calendar URL).

Another alternative would be to use an external WSGI server. However, I didn't hear about such servers until now, and I doubt that it would be easy to set up such a system on Windows. However, I'll look into it until lunch.

Regarding the radicale setup in Linux, that would eventually be another option. I am a bit hesitant with this because I know some corners of Linux quite well, but have no clue of others. Notably, I am currently not sure if I would interpret the output of top correctly and whether top would be the right tool to evaluate the CPU load caused by radicale.

Dec 10 '25 10:12 Binarus

Turning-Off SSL: Thunderbird side can be easily done adjusting the configuration for each calendar "about:config" -> search for "calendar*uri" -> but credentials and content are then passing in cleartext the network

If this really helps, then you have potentially an random number entropy, I would check this in advance, but no clue how to do this on Windows...on Linux it's easy.

Can you confirm that the system turns idle once all clients are disconnected...e.g. by closing port with local firewall?

I would force the users to increase the sync interval to 15 min for now until root cause is identified, this should avoid interlacing calls blocking each other.

Once finished profiling implementation (currently working on a request log for long-duration-only) I will check how to implement an overload prevention by rejecting requests then.

Dec 10 '25 11:12 pbiering

Regarding Linux, if you install "AlmaLinux 10" and enable EPEL repo you can use RPMs to install...and yes, "top" will show you CPU, memory and swapiness.

Dec 10 '25 11:12 pbiering

I can definitely confirm that the radicale service (not the whole system) goes to 0% CPU if all clients are disconnected. For example, during a short test this morning, I have turned off SSL in the radicale server, so that clients could not connect any more (1). During that time, radicale's CPU load stayed at 0%, eventually with a very small and short spike while it started.

(1) To be precise, the clients tried to connect, but didn't send or receive meaningful data. It is interesting that the load was at 0% nevertheless. It seems that the load is indeed caused by actually serving requests correctly, which was not the case in this situation.

I now have another idea, but don't know yet whether it is feasible. Perhaps I could install just an SSL proxy (also called SSL tunnel) on that server that transparently handles SSL and at the other hand forward requests to radicale without SSL. Setting up a fully blown external web server or proxy would take too much time at the moment.

Regarding non-encrypted data traffic, that would be acceptable in this scenario at least for a few hours of testing. Changing the passwords afterwards would no be a great drama.

I everything else fails, I'll make the users configure longer synchronization intervals. But at the moment, I believe that it would not be appropriate, because if it helps, it would also make further testing impossible. I wouldn't like to bother the users too often by instructing them to increase the interval, then to reduce it again.

Dec 10 '25 11:12 Binarus

OK, some progress here: SSL / TLS is definitely not the cause for the high CPU load.

I have setup stunnel on the server and made it accept the SSL client connections and relay them unencrypted to radicale. Of course, I have turned off SSL in the radicale configuration before. The whole setup was very easy (four lines in the stunnel configuration file and changing one line in the radicale configuration file), and I was more than baffled that stunnel is still around and is actively developed. That was a great surprise; the last time I've used it was a decade or two ago.

Coming back to the topic, radicale in this setup causes the same extreme CPU load as before. Therefore, I'll restore the original configuration soon.

While this test did not yield a solution to the problem, I think it was important to rule out SSL as the culprit (notably because I have read respective reports elsewhere).

Dec 10 '25 12:12 Binarus

Next PR merged, now per request profiling has capability to log request header and content to find the client calls which are at least running long and therefore candidates to block the system.

Hints added here: https://github.com/Kozea/Radicale/wiki/Performance-Tuning

Dec 10 '25 17:12 pbiering

Just from what I can see on my production system it looks like

PROPFIND requesting sync-token take some time, here having ~ 1400 items in folder approx. 12 seconds

<?xml version="1.0"?>
<propfind xmlns="DAV:" xmlns:C="urn:ietf:params:xml:ns:caldav" xmlns:CS="http://calendarserver.org/ns/">
<prop>
<sync-token />
</prop>
</propfind>

Have to investigate the code where "sync.py" is consuming most CPU time.

Dec 10 '25 18:12 pbiering

Started some code investigations, did not really found major improvement, only one for mtime+size caching method: https://github.com/Kozea/Radicale/pull/1936 You can give a try, it should help if you have items having a bigger sizer then usual, because read of item is skipped if not needed for cache update.

Assuming there is more or less a linear scaling depending on the amount of items in one collection.

Dec 10 '25 20:12 pbiering

any update with results using latest upstream?

Dec 15 '25 20:12 pbiering

I don't think this is specific to Windows, I also see this with the latest released version on (Alpine) Linux:

Dec 16 '25 17:12 craftyguy

Have you enabled now the profiling options for requests which running longer than e.g. 20 seconds and can you provide latest logwatch statistic?

Have you also enabled mtime+size caching method?

Dec 16 '25 17:12 pbiering

Have you also enabled mtime+size caching method?

I built radicale from git a few minutes ago and I'm trying it out. I still see high CPU load when >1 client is syncing but it fairly quickly returns to negligible load. On the latest release, it seems to be tied up for a LOT longer (and made worse because clients might sync before it has finished), so I was seeing almost constant high cpu usage.

Have you enabled now the profiling options for requests which running longer than e.g. 20 seconds and can you provide latest logwatch statistic?

I have not, but since it already seems better with the latest in git maybe it's not necessary? I'll keep an eye on it, if I see it stuck causing high CPU load for prolonged periods of time again I'll enable the extra debug stuff and get back to you.

Dec 16 '25 17:12 craftyguy