vdirsyncer icon indicating copy to clipboard operation
vdirsyncer copied to clipboard

vdirsyncer single file calendar sync's growth of running time is quadratic or higher (it takes "forever")

Open bernhardreiter opened this issue 4 years ago • 10 comments

Observed: running /usr/bin/vdirsyncer -v debug sync and the process hangs in one thread.

Expected: Sync completes. It did before the move to Debian buster and switch of vdirsyncer version. Or a diagnostic message that help finding the problem, if there is one.

Further observations: an strace shows futex(0x7f387429e890, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY and a py-bt with gdb shows

  File "/usr/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
  File "/usr/lib/python3.7/queue.py", line 170, in get
    self.not_empty.wait()
  File "/usr/lib/python3/dist-packages/click_threading/__init__.py", line 110, in run
    func, future = self.tasks.get()
  File "/usr/lib/python3/dist-packages/vdirsyncer/cli/utils.py", line 375, in join
    ui_worker.run()
  <built-in method next of module object at remote 0x7f387d2bcc28>
  File "/usr/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/usr/lib/python3/dist-packages/vdirsyncer/cli/__init__.py", line 154, in sync
    wq.spawn_worker()
[..]
  • vdirsyncer 0.16.7 (problem reproducable with pipx install from pypi and Debian Buster version)
  • posteo server (but probably irrelevant)
  • python3 Debian Buster Version: 3.7.3-1
  • config:
[general]
status_path = "~/.vdirsyncer/status/"

[pair my_calendar]
a = "my_calendar_remote"
b = "my_calendar_local"
collections = null
conflict_resolution = "a wins"

[storage my_calendar_remote]
type = "caldav"
read_only = true
url = "https://posteo.de:8443/calendars/bernh/default/"
username = [email protected]"
password = "YYY"

[storage my_calendar_local]
type = "singlefile"
path = "~/.vdirsyncer/my_calendar.ics"
  • the last line of -v debug is
debug: Already normalized: '/calendars/bernh/default/XXXXX.ics'
  • Related reports: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959097 with more details.

bernhardreiter avatar May 01 '20 11:05 bernhardreiter

Removing .vdirsyncer and doing a new discover and sync works, but the next sync hangs again.

bernhardreiter avatar May 01 '20 13:05 bernhardreiter

Meanwhile I can reproduce this just with singlefile one way sync. Some .ics files work, others don't. Next step is to produce a problem case that can be published.

install vdirsyncer "The clean, easy way"

set base /home/bernhardreiter/vtest
PIPX_HOME=$base PIPX_BIN_DIR=$base/bin pipx install vdirsyncer
# gets me   package vdirsyncer 0.16.7, Python 3.7.3
pushd $base

setup singlefile -> singlefile one way config

mkdir problem-1
echo '[general]
status_path = "$base/problem-1"

[pair my_calendar]
a = "my_calendar_a"
b = "my_calendar_b"
collections = null
conflict_resolution = "a wins"

[storage my_calendar_a]
type = "singlefile"
path = "$base/problem-1/my_calendar_a.ics"

[storage my_calendar_b]
type = "singlefile"
path = "$base/problem-1/my_calendar_b.ics"
' | sed -e "s|\$base|$base|" >problem-1/config

testing a calendar

rm problem-1/my*
cp my_calendar_bad.ics problem-1/my_calendar_a.ics
touch problem-1/my_calendar_b.ics
./bin/vdirsyncer -v DEBUG -c problem-1/config discover
./bin/vdirsyncer -v DEBUG -c problem-1/config sync

bernhardreiter avatar Jun 03 '20 13:06 bernhardreiter

Doing more tests, even applying delta debugging I found that it is not a hang but a more then linearithmic grows of running time for the number of calendar items synced in a single file.

A calendar with 1822 items needed 4:26 (4 hours and 26 minutes) to sync. That can easily be be mistaked for a "hang". :)

Conclusion: something about the implemented algorithm is not good, as this is a high order of growth on the input files.

Details

As it was a calendar with personal data, I cannot publish it. As I expected something in the contents itself, I've tried to reduce the problem case first.

Below is the code that worked from a vdir directory of the original data and a text file that had all the filenames in it to run delta debugging. Using delta debugging was not useful, as all syncs were successful, so the timeout test was wrong. However this may be a template for other tests.

#!/usr/bin/env python3
"""Test if vdirsyncer fails to sync on a list of iCalender files.

File names are hardcoded.
This can be used as script to test if a set is interesting when applying
deltadebugging. Used with https://github.com/dsw/delta

./delta-release-2020-06-22/delta -test=tryentries4.py -in_place entries

SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: 2021 Intevation GmbH <https://intevation.de>
Author: <[email protected]>
"""
import os
from pathlib import Path
import shutil
import subprocess
import sys
import time
import tempfile

entries = []

for line in open("entries", "rt"):
    entries.append(line[:-1])

# print(len(entries))
# print(repr(entries[-1]))

def write_config(target_dir):
    template = """
[general]
status_path = "{target_dir}/status/"

[pair my_calendar]
a = "my_calendar_a"
b = "my_calendar_local"
collections = null
conflict_resolution = "a wins"

[storage my_calendar_a]
type = "filesystem"
fileext = ""
read_only = true
path = "{target_dir}/vdir"

[storage my_calendar_local]
type = "singlefile"
path = "{target_dir}/my_calendar_b.ics"
"""
    # print(template.format(target_dir=target_dir.resolve()))
    (target_dir / "config").write_text(
        template.format(target_dir=target_dir.resolve()))

    # create an empty iCalender file
    (target_dir / "my_calendar_b.ics").touch()

def runtest(original_dir, target_dir, entries):
    org_path = original_dir / "vdir"
    vdir_path = target_dir / "vdir"
    vdir_path.mkdir(parents=True)

    for entry in entries:
        shutil.copy(org_path / entry, vdir_path)

    write_config(target_dir)

    cmd_basic =["/home/bern/vdirsyncer/venv/bin/vdirsyncer",
                "-c",  (target_dir / "config").resolve()]

    p = subprocess.Popen(cmd_basic + ["discover"])
    print(f"Running {p.args}")
    p.wait(3)

    p = subprocess.Popen(cmd_basic +["-v", "WARNING", "sync"])
    print(f"Running {p.args}")
    p.wait(8)

    p = subprocess.Popen(cmd_basic +["sync"])
    starttime = time.perf_counter()
    print(f"Running {p.args}")
    try:
        p.wait(36000)
    except subprocess.TimeoutExpired:
        stoptime = time.perf_counter()
        elapsed = int(stoptime - starttime)
        print("Killing {!r} after {:d} seconds".format(p, elapsed))
        p.kill()
        return False
    stoptime = time.perf_counter()
    elapsed = int(stoptime - starttime)
    print("Sync completed after {:d} seconds".format(elapsed))
    return True

#result = runtest(Path("./fetch-private2"), Path("./testall"), entries)

with tempfile.TemporaryDirectory() as tempdir_name:
    print("Testing in {}".format(tempdir_name))
    result = runtest(Path("./fetch-private2"), Path(tempdir_name), entries)

if result: # we've failed to find an error, as "delta" expects it.
    sys.exit(1)

Here are some numbers to show the increase of runtime:

  • /1822.py Sync completed after 15998 seconds
  • ./t911.py Sync completed after 2640 seconds
  • [0,797] Sync completed after 1733 seconds
  • [0,683] Sync completed after 1193 seconds
  • [0,455] Sync completed after 461 seconds
  • [455,911] Sync completed after 544 seconds
  • ./t170.py Sync completed after 63 seconds

bernhardreiter avatar May 19 '21 13:05 bernhardreiter

Can you check if vdirsyncer sync --max-workers 1 fixes the issue?

WhyNotHugo avatar Jun 14 '21 20:06 WhyNotHugo

Thanks for the hint. I will.

bernhardreiter avatar Jun 15 '21 07:06 bernhardreiter

Let me know how that goes, I have some theories on what’s going on.

WhyNotHugo avatar Jun 15 '21 07:06 WhyNotHugo

@WhyNotHugo

if vdirsyncer sync --max-workers 1 fixes the issue?

Letting the big test (t1822m1.py) run for 10 minutes: not terminated. 10 minutes is too long for this simple sync (as there is no change in this test it only needs to compare 1822 iCal objects). So unfortunately: no, not solved by --max-workers=1

bernhardreiter avatar Jun 20 '21 12:06 bernhardreiter

Can you share the test? I can try and profile it and see if that yields anything useful.

There's clearly a bug here, but I've no idea where.

WhyNotHugo avatar Jul 14 '21 18:07 WhyNotHugo

Can you share the test?

Unfortunately not, as it is a personal calendar with private data. (This is why I've went with delta debugging first, my hypothesis then was that it would be related to the contents of some iCalender objects.)

liberally licensed calender data set

Do you know any liberarlly licenced larger iCalender data sets? I've tried to find some, but was not successful?

Best candidates after search

  • https://cantonbecker.com/astronomy-calendar/ ~400 entries, but license unknown
  • https://catalog.data.gov/dataset/nypd-shooting-incident-data-historic (not in ics format)

bernhardreiter avatar Jul 15 '21 06:07 bernhardreiter

I used Icalendar-generator to generate an icalender-file with 2000 items, and tested it with singlefile one way sync. --> it took like 4 minutes and 30 seconds to complete.

config:

[general]
status_path = "~/vtest"

[pair foobar]
a = "foo"
b = "bar"
collections = null
conflict_resolution = "a wins"

[storage foo]
type = "singlefile"
path = "~/vtest/calendar-a/my_calendar_a.ics"

[storage bar]
type = "singlefile"
path = "~/vtest/calendar-b/my_calendar_a.ics"

testing calendar:


cp ~/icalendar-generator/calendar.ics ~/vtest/calendar-a/my_calendar_a.ics
touch ~/vtest/calendar-b/my_calendar_a.ics

vdirsyncer -v debug -c  /vtest/config discover
vdirsyncer -v debug -c  /vtest/config sync
  • vdirsyncer 0.16.7
  • python3 3.7.3

Fadiabb avatar Oct 06 '21 10:10 Fadiabb

Hi, I have the same problem, vdirsyncer sync takes forever.

I'm syncing two calendars from my private Nextcloud instance. After removing all contents of the .config/vdirsyncer/data and .vdirsyncer/status, the first vdirsyncer sync runs to completion without issues (after the appropriate vdirsyncer discover calls). The second runs "forever" (I stopped it after ~15 minutes). I can consistently reproduce this.

Stracing the process I can see that vdirsyncer is "stuck" in a loop at 100% CPU usage:

strace: Process 139392 attached
brk(0x55c0a6fdf000)                     = 0x55c0a6fdf000
brk(0x55c0a7001000)                     = 0x55c0a7001000
brk(0x55c0a7026000)                     = 0x55c0a7026000
brk(0x55c0a7048000)                     = 0x55c0a7048000
brk(0x55c0a7069000)                     = 0x55c0a7069000
brk(0x55c0a708b000)                     = 0x55c0a708b000
brk(0x55c0a70ad000)                     = 0x55c0a70ad000
brk(0x55c0a70d0000)                     = 0x55c0a70d0000
brk(0x55c0a70f3000)                     = 0x55c0a70f3000
brk(0x55c0a7118000)                     = 0x55c0a7118000
brk(0x55c0a7141000)                     = 0x55c0a7141000
brk(0x55c0a7164000)                     = 0x55c0a7164000
brk(0x55c0a7186000)                     = 0x55c0a7186000
brk(0x55c0a71aa000)                     = 0x55c0a71aa000
brk(0x55c0a71cd000)                     = 0x55c0a71cd000
brk(0x55c0a71f0000)                     = 0x55c0a71f0000
brk(0x55c0a700a000)                     = 0x55c0a700a000
brk(0x55c0a6e95000)                     = 0x55c0a6e95000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
brk(0x55c0a6efa000)                     = 0x55c0a6efa000
brk(0x55c0a6fc3000)                     = 0x55c0a6fc3000
close(8)                                = 0
brk(0x55c0a6fe5000)                     = 0x55c0a6fe5000
brk(0x55c0a7007000)                     = 0x55c0a7007000
brk(0x55c0a702c000)                     = 0x55c0a702c000
brk(0x55c0a704d000)                     = 0x55c0a704d000
brk(0x55c0a706f000)                     = 0x55c0a706f000
brk(0x55c0a7091000)                     = 0x55c0a7091000
brk(0x55c0a70b3000)                     = 0x55c0a70b3000
brk(0x55c0a70d6000)                     = 0x55c0a70d6000
brk(0x55c0a70f9000)                     = 0x55c0a70f9000
brk(0x55c0a711e000)                     = 0x55c0a711e000
brk(0x55c0a7146000)                     = 0x55c0a7146000
brk(0x55c0a7169000)                     = 0x55c0a7169000
brk(0x55c0a718c000)                     = 0x55c0a718c000
brk(0x55c0a71af000)                     = 0x55c0a71af000
brk(0x55c0a71d2000)                     = 0x55c0a71d2000
brk(0x55c0a71f5000)                     = 0x55c0a71f5000
brk(0x55c0a7010000)                     = 0x55c0a7010000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
close(8)                                = 0
brk(0x55c0a7031000)                     = 0x55c0a7031000
brk(0x55c0a7053000)                     = 0x55c0a7053000
brk(0x55c0a7074000)                     = 0x55c0a7074000
brk(0x55c0a7097000)                     = 0x55c0a7097000
brk(0x55c0a70b9000)                     = 0x55c0a70b9000
brk(0x55c0a70dc000)                     = 0x55c0a70dc000
brk(0x55c0a70ff000)                     = 0x55c0a70ff000
brk(0x55c0a7123000)                     = 0x55c0a7123000
brk(0x55c0a7147000)                     = 0x55c0a7147000
brk(0x55c0a716f000)                     = 0x55c0a716f000
brk(0x55c0a7192000)                     = 0x55c0a7192000
brk(0x55c0a71b5000)                     = 0x55c0a71b5000
brk(0x55c0a71d9000)                     = 0x55c0a71d9000
brk(0x55c0a700a000)                     = 0x55c0a700a000
newfstatat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", {st_mode=S_IFREG|0600, st_size=412857, ...}, 0) = 0
openat(AT_FDCWD, "/home/blue/.config/vdirsyncer/data/events_personal.ics", O_RDONLY|O_CLOEXEC) = 8
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
ioctl(8, TCGETS, 0x7ffdc727fce0)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(8, 0, SEEK_CUR)                   = 0
lseek(8, 0, SEEK_CUR)                   = 0
newfstatat(8, "", {st_mode=S_IFREG|0600, st_size=412857, ...}, AT_EMPTY_PATH) = 0
read(8, "BEGIN:VCALENDAR\r\nPRODID:-//IDN n"..., 412858) = 412857
read(8, "", 1)                          = 0
close(8)                                = 0

eguiraud avatar Mar 08 '23 16:03 eguiraud

Attaching a debugger (something that will be much harder to do for users if you move to Rust, btw, because of compiler optimizations and potential lack of debug symbols :D ), I see that we never get out of this loop:

https://github.com/pimutils/vdirsyncer/blob/3bf9a3d6840672656b2e9685bc7e7bc50a3e4700/vdirsyncer/sync/init.py#L62-L67

and in particular we spend a lot of time (many seconds) for each iteration of the inner loop at the bottom of this callstack:

  /usr/lib/python3.10/site-packages/vdirsyncer/sync/__init__.py(62)prepare_new_status()
     61         if prefetch:
---> 62             async for href, item, etag in self.storage.get_multi(prefetch):
     63                 _store_props(

  /usr/lib/python3.10/site-packages/vdirsyncer/storage/base.py(162)get_multi()
    161         for href in uniq(hrefs):
--> 162             item, etag = await self.get(href)
2   163             yield href, item, etag

> /usr/lib/python3.10/site-packages/vdirsyncer/storage/singlefile.py(127)get()
    126         if self._items is None or not self._at_once:
--> 127             async for _ in self.list():
3   128                 pass

eguiraud avatar Mar 08 '23 17:03 eguiraud

And here is a flamegraph that clearly shows where time is spent. If you download the SVG and open the downloaded file in a browser it's actually interactive.

vdirsyncer.svg.zip

vdirsyncer

eguiraud avatar Mar 08 '23 17:03 eguiraud

I don't know vdirsyncer internals well enough to judge whether we could completely avoid the hashing, or at least provide a faster (compiled?) implementation.

Since parse in vobject.py is taking ~50% of the runtime, I checked whether there was some easy win there. Here's the line-by-line profile, I can't see anything obvious that could be improved to reduce the runtime of parse significantly:

File: /home/blue/vdirsyncer_dbg_env/lib/python3.10/site-packages/vdirsyncer/vobject.py
Function: parse at line 266

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   266                                               @classmethod
   267                                               @profile
   268                                               def parse(cls, lines, multiple=False):
   269      8166       4055.8      0.5      0.1          if isinstance(lines, bytes):
   270                                                       lines = lines.decode("utf-8")
   271      8166       1936.1      0.2      0.0          if isinstance(lines, str):
   272      8166     162844.5     19.9      2.5              lines = lines.splitlines()
   273
   274      8166       1430.3      0.2      0.0          stack = []
   275      8166       1135.2      0.1      0.0          rv = []
   276      8166       1096.1      0.1      0.0          try:
   277   5649332    1086100.1      0.2     16.6              for _i, line in enumerate(lines):
   278   4995192    1157565.0      0.2     17.7                  if line.startswith("BEGIN:"):
   279    654139     266100.6      0.4      4.1                      c_name = line[len("BEGIN:") :].strip().upper()
   280    654139     471390.6      0.7      7.2                      stack.append(cls(c_name, [], []))
   281   4341055    1034122.4      0.2     15.8                  elif line.startswith("END:"):
   282    654137     126078.4      0.2      1.9                      component = stack.pop()
   283    645972      77530.0      0.1      1.2                      if stack:
   284    645972     175547.5      0.3      2.7                          stack[-1].subcomponents.append(component)
   285                                                               else:
   286      8165       2011.3      0.2      0.0                          rv.append(component)
   287                                                           else:
   288   4341055     777734.7      0.2     11.9                      if line.strip():
   289   4341055    1194707.5      0.3     18.2                          stack[-1].props.append(line)
   290         1          1.1      1.1      0.0          except IndexError:
   291                                                       raise ValueError(f"Parsing error at line {_i + 1}")
   292
   293      8145       1168.9      0.1      0.0          if multiple:
   294        20          3.0      0.1      0.0              return rv
   295      8145       2363.2      0.3      0.0          elif len(rv) != 1:
   296                                                       raise ValueError(f"Found {len(rv)} components, expected one.")
   297                                                   else:
   298      8145       1502.3      0.2      0.0              return rv[0]

eguiraud avatar Mar 08 '23 18:03 eguiraud

I don't know vdirsyncer internals well enough to judge whether we could completely avoid the hashing, or at least provide a faster (compiled?) implementation.

For the singlefile backend, if a single component in a very large file changes, it's very hard to determine which component has changed. For the vdir backend, each component is a separate file, with its separate mtime, so we can just check that (plus the inode number).

For singlefile there's no obvious equivalent, so we use its hash of each one to determine if it has changed since the last sync or not. I don't think there's any obvious way around this.

However, if we're hashing any item one more than once per run, that's a bug a can be optimized away. That aside, cheaper hashing for this use-case might make sense.

There's also a lot of string copying/concat going on in conversions. That can probably be improved, but I've no concretes ideas on how.

WhyNotHugo avatar Mar 08 '23 21:03 WhyNotHugo

Mmh ok I see three suspicious things happening that together might explain the sync taking forever. @WhyNotHugo please let me know what you think.

Storage.get_multi contains the following lines:

   for href in uniq(hrefs): # there are 756 hrefs in my case
       item, etag = await self.get(href)
       yield href, item, etag

This in turn calls SingleFileStorage.get which again loops over 756 elements:

   async for _ in self.list(): # also 756 elements!
      pass

SingleFileStorage.get loops over 756 elements just to return one of them, every time. Not sure why it doesn't only produce the element it needs to return instead:

    async def get(self, href):
        if self._items is None or not self._at_once:
            async for _ in self.list():
                pass

        try:
            return self._items[href]

I waited until the 756*756 loop was over, and then vdirsyncer started re-syncing back every event to my Nextcloud calendar:

Copying (updating) item 0022f6bb-f1ec-4b52-88d7-0dfb608914b5 to events_personal_remote
Copying (updating) item 00745806-51c1-435a-98dd-c9dca085c641 to events_personal_remote
Copying (updating) item 00910f46-2053-42ad-abac-a5013dd51b46 to events_personal_remote
Copying (updating) item 00abeab0-e782-4f5b-8d93-c04a24261c43 to events_personal_remote

which it really shouldn't do because there is nothing to update: those events had just been copied from the calendar during the first sync (the one that completes quickly).

eguiraud avatar Mar 09 '23 00:03 eguiraud

This in turn calls SingleFileStorage.get which again loops over 756 elements:

Check closely: it only does the iteration if self_items is None. self._items is a cache of item. This loop should only execute once because self.list() populates self._items.

These lines:

        if self._items is None or not self._at_once:
            async for _ in self.list():
                pass

Are an ugly way of forcing the cache to get populated.

WhyNotHugo avatar Mar 09 '23 18:03 WhyNotHugo

it's really executed every time in my case, because self._at_once is False

EDIT: self._at_once is False here because sync() calls prepare_new_status() without using the at_once() context manager, here: https://github.com/pimutils/vdirsyncer/blob/54a90aa5dd34fb3583965535be82b0a552f18c7d/vdirsyncer/sync/init.py#L143-L146

A few lines later, when actions are run, at_once() is turned on: https://github.com/pimutils/vdirsyncer/blob/54a90aa5dd34fb3583965535be82b0a552f18c7d/vdirsyncer/sync/init.py#L155-L158

eguiraud avatar Mar 09 '23 19:03 eguiraud

I think we can override the get_many method for this storage type and use an at_once() context manager inside of it.

WhyNotHugo avatar Mar 09 '23 23:03 WhyNotHugo

Basically, this bit should be wrapped in at_once():

   for href in uniq(hrefs): # there are 756 hrefs in my case
       item, etag = await self.get(href)
       yield href, item, etag

WhyNotHugo avatar Mar 09 '23 23:03 WhyNotHugo

Alright, I'll give it a go after I create another calendar for this purpose, because problem number 3. here makes me wary of running vdirsyncer sync to completion on my actual personal calendar :sweat_smile:

eguiraud avatar Mar 09 '23 23:03 eguiraud

Alright, I'll give it a go after I create another calendar for this purpose, because problem number 3.

It would be useful to see both versions and see what changed. It might be that the server normalised into a semantically-equivalent format.

Do any of your entries have more than one PRODID field?

WhyNotHugo avatar Mar 10 '23 00:03 WhyNotHugo

The patch proposed at https://github.com/pimutils/vdirsyncer/pull/1051 seems to fix points 1. and 2.

eguiraud avatar Mar 10 '23 00:03 eguiraud

It would be useful to see both versions and see what changed.

Do any of your entries have more than one PRODID field?

This seems to be triggered by events that I add to the calendar from my phone caldav client, not with events that I create directly from the Nextcloud web interface.

Here is the diff before and after vdirsyncer updates an event that it should have no business updating:

~/Downloads diff -u 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2.ics 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2\(1\).ics
--- 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2.ics	2023-03-09 18:35:23.560070893 -0600
+++ 67cbcb15-d064-4f63-aa9b-38ccf7ef50c2(1).ics	2023-03-09 18:36:15.013387075 -0600
@@ -1,19 +1,9 @@
 BEGIN:VCALENDAR
 VERSION:2.0
+PRODID:-//IDN nextcloud.com//Calendar app 3.4.2//EN
 PRODID:DAVx5/4.3-ose ical4j/3.2.7 (ws.xsoh.etar)
-BEGIN:VEVENT
-DTSTAMP:20230310T003500Z
-UID:67cbcb15-d064-4f63-aa9b-38ccf7ef50c2
-SUMMARY:Test event 3
-DTSTART;TZID=America/Guatemala:20230311T000000
-DTEND;TZID=America/Guatemala:20230311T010000
-STATUS:CONFIRMED
-BEGIN:VALARM
-TRIGGER:-PT10M
-ACTION:DISPLAY
-DESCRIPTION:Test event 3
-END:VALARM
-END:VEVENT
+PRODID:-//Sabre//Sabre VObject 4.4.1//EN
+CALSCALE:GREGORIAN
 BEGIN:VTIMEZONE
 TZID:America/Guatemala
 LAST-MODIFIED:20221105T024526Z
@@ -27,4 +17,17 @@
 RDATE:20061001T000000
 END:STANDARD
 END:VTIMEZONE
+BEGIN:VEVENT
+DTSTAMP:20230310T003500Z
+UID:67cbcb15-d064-4f63-aa9b-38ccf7ef50c2
+SUMMARY:Test event 3
+DTSTART;TZID=America/Guatemala:20230311T000000
+DTEND;TZID=America/Guatemala:20230311T010000
+STATUS:CONFIRMED
+BEGIN:VALARM
+TRIGGER:-PT10M
+ACTION:DISPLAY
+DESCRIPTION:Test event 3
+END:VALARM
+END:VEVENT
 END:VCALENDAR

Maybe interestingly vdirsyncer adds 2 extra PRODID lines.

eguiraud avatar Mar 10 '23 00:03 eguiraud

So you created this on the phone, vdirsyncer downloaded it, and the next sync it was trying to create it?

Looks like it's mostly shuffling the order of the items. This is due to how singlefile works. vtimezones are shared across ALL events in the file, so they're in another order, and when "splitting" to sync, they end up in another order. itÅ› mostly harmless.

I can only think of extremely complicated fixes right now.

The PRODID duplication is a bug of the singlefile storage and needs to be fixed.

WhyNotHugo avatar Mar 10 '23 00:03 WhyNotHugo

you created this on the phone, vdirsyncer downloaded it, and the next sync it was trying to create it?

yes, the next sync it updated it as above. for some reason, events created via the web interface are already like vdirsyncer wants them, so they are left alone.

the problem with the mostly harmless update is that it takes forever if the calendar has hundreds or thousands of such events.

eguiraud avatar Mar 10 '23 01:03 eguiraud

Thanks for looking into the issue. I've skimmed your new messages. Just one remark, about my old findings:

found that it is not a hang but a more then linearithmic grows of running time for the number of calendar items synced in a single file.

(from https://github.com/pimutils/vdirsyncer/issues/818#issuecomment-844085698)

At least I could prove by delta debugging that there haven't been a bad calender entry in my test data triggering a defect and all tests terminated (but some only after hours). HTH

bernhardreiter avatar Mar 10 '23 09:03 bernhardreiter

#1051 has been merged. Can you check how performance is when using current main branch?

WhyNotHugo avatar Mar 10 '23 12:03 WhyNotHugo

Can you check how performance is when using current main branch?

Testing vdirsyncer, version 0.19.2.dev6+g079a156 (current main). The performance is much better compared to 0.19.1.

Growth of running time seems to be more linear then before. I consider the defect fixed and hope for the next release to appear! :)

Thanks again for maintaining vdirsyncer and for all that have helped to fix the issue. (I'm resolving this issue as help to the maintainer.)

details of testing

Using an old version of https://cantonbecker.com/astronomy-calendar/ with 411 entries:

grep SUMM astrocal-20210525.ics | wc -l
411
/home/ber/.local/bin/vdirsyncer --version
vdirsyncer, version 0.19.1
time /home/ber/.local/bin/vdirsyncer -v DEBUG -c problem-1/config sync
real    0m21,845s

git clone https://github.com/pimutils/vdirsyncer.git vdirsyncer-git
cd vdirsyncer-git/
pip install -e .
/home/ber/.local/bin/vdirsyncer --version
vdirsyncer, version 0.19.2.dev6+g079a156
time /home/ber/.local/bin/vdirsyncer -v DEBUG -c problem-1/config sync
real    0m0,780s

Using https://gitlab.com/FadiAbb/icalendar-generator to generate from 2023-04-01 with number of entries, v0.19.1. and main branch timings:

822
real    0m37,624s
real    0m0,989s

1644
real    2m32,770s
real    0m1,424s

bernhardreiter avatar Apr 27 '23 13:04 bernhardreiter

Thanks for confirming!

WhyNotHugo avatar Apr 27 '23 17:04 WhyNotHugo