desktop icon indicating copy to clipboard operation
desktop copied to clipboard

Improve speed for initial sync with virtual files

Open wonx opened this issue 3 years ago • 60 comments

How to use GitHub

  • Please use the 👍 reaction to show that you want to have the same feature implemented.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Feature description

When using virtual files, the first log in after a new installation will start a syncing process that can take a very long time depending on the number of files to synchronize.

In my case, i'm syncing around ~700000 files, my computer has been already up for 29 hours without a restart and the sync process has now reached the 50% mark. I can see that the virtual files are created one by one, but it can be as slow as 2 per second. Two or more days until Nextcloud can be usable is too much in my opinion.

It would be cool if there was any way to speed up the initial sync.

PS: This is related to https://github.com/nextcloud/desktop/issues/4421

wonx avatar Apr 07 '22 20:04 wonx

I'm experiencing a similar problem on Mac OS (around 200k files). In my humble opinion, syncrhonizing the full hierarchy is the key problem here. The typical end user doesn't need to have the full folder hierarchy saved and synchronized. A lazier approach (i.e. trigger on open and/or scan the opened subfolders only, and not the whole depth trough but perhaps 1 or 2 levels below) would grant more scalability and decrease the load on the NC server.

marcotrevisan avatar Apr 27 '22 06:04 marcotrevisan

I'd like to add, that restarting the sync, client or PC will result in a complete restart of the process. Also the sync doesn't seem to start immediately, but it first counts all files it will sync and then starts syncing. The counting alone takes two days for me and the sync isn't done after more than 10 days. At least the sync should pick up where it left of.

johannes-luebke avatar Jul 10 '22 19:07 johannes-luebke

I have the same issue.

The most annoying part is that I do not need the folders with all the little files available on my desktop.

So it would be enough if I could say "do not sync this folder unless it is accessed by the user".

I think the suggestion from @marcotrevisan (see https://github.com/nextcloud/desktop/issues/4464) also sounds promising.

CWempe avatar Aug 12 '22 19:08 CWempe

See https://github.com/nextcloud/desktop/issues/4918#issuecomment-1246386007 for a description of a problem with the tray window related to speed issues for inital sync

PhilippSchlesinger avatar Sep 14 '22 08:09 PhilippSchlesinger

I can confirm this issue.

I started syncing virtual files (~1 million) on a new notebook. I knew it would take a while. The next day I checked and saw about 30 % finished. The day after that only 10 % more (= 40%). In the application window I could see that roughly one file was processed per second.

Then I read about restarting the client software here. And the syncing (files per second) increased dramatically.

Now I took some data to verify this behavior:

image

image

So the best workaround would be a script that restarts the Nextcloud client every 30 minutes or so. 😜

Bu it would be great if this could be fixed.

Server: 24.0.7 (docker) Client: 3.6.2 (Windows)

CWempe avatar Dec 07 '22 16:12 CWempe

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past. Maybe you guys could also check again and see if it improved with the latest version.

PhilippSchlesinger avatar Feb 23 '23 12:02 PhilippSchlesinger

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past. Maybe you guys could also check again and see if it improved with the latest version.

@CWempe Since you described the issue in detail and with numbers previously, could you maybe check again with 3.7.3 or later and report if anything changed?

PhilippSchlesinger avatar Apr 14 '23 10:04 PhilippSchlesinger

Like I said here : https://github.com/nextcloud/desktop/issues/3120#issuecomment-1584621317, I'm still having the problem with Nextcloud 25 and desktop client 3.8.2. In 24 hours it had not yet finished to count files to synchronize, then it lost connexion, and restarted from scrath... About 2 000 000 files.

tomdereub avatar Jun 09 '23 14:06 tomdereub

I can also confirm that this issue persists with 3.9.0 and [Cloud] 26.0.2. For approximately 500k files, the anticipated time jumps between 6 days and “A few seconds” – It “syncs” (virtual files) ruffly 100 files per second. Just for testing purposes, I tried to sync the same load of files with the ownCloud [v4.1.0-rc.2] https://github.com/owncloud/client/tree/v4.1.0-rc.2) Client. This client does the job much faster, approx. 500–700 files per second – same server. It could be my laptop, but at least for the NC client with the other 30–50 laptops I experience the same issue.

limatus avatar Jun 15 '23 08:06 limatus

@limatus Try with ownCloud Infinite Scale, 3.0 just got released, would expect 4x performance compared with oC10,

hodyroff avatar Jun 15 '23 08:06 hodyroff

@hodyroff thank for the hint, but I do not intend to switch servers – the Server was and is from NC!

limatus avatar Jun 15 '23 09:06 limatus

@claucambra is this a duplicate of [#5692](https://github.com/nextcloud/desktop/issues/5692 or vice vera?

tobiasKaminsky avatar Jul 20 '23 11:07 tobiasKaminsky

They are different, this is related to the Windows VFS (normal sync engine) while #5692 is related to the macOS-specific sync engine in the file provider module

claucambra avatar Jul 21 '23 06:07 claucambra

@limatus @CWempe Just to get a bit more context on Virtual Files vs normal sync, do you have a much slower syncing when using Virtual Files when compared to how it syncs via normal sync if you also select to sync everything?

allexzander avatar Jul 25 '23 10:07 allexzander

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well. The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync. Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (https://github.com/nextcloud/desktop/issues/3120#issuecomment-907067592) without visible progress...

tomdereub avatar Aug 03 '23 12:08 tomdereub

@allexzander if I sync the files via normal sync, the bottleneck seems to be the connection speed, which is understandable. Sadly, we mostly use virtual files, as they're simply too many files. It's similar to what @tomdereub mentioned, the initial sync needs days, thereafter, it’s fine.

limatus avatar Aug 03 '23 13:08 limatus

@allexzander For the sake of completeness I'd like to add that what @tomdereub and others are describing also happens when a significant amount of files are added to the nextcloud account after the initial sync. So when the nextcloud client needs to sync this newly added amount of files, the client shows the same problem as on the initial sync.

As described by @CWempe in https://github.com/nextcloud/desktop/issues/4424#issuecomment-1341235591, the sync speed decreases dramatically over time. Is this perhaps due to the real-time listing of activities in the tray window for each individual file being synced? If this could be identified as a cause of the slowdown, then perhaps lazyloading activities or even summary listing for large numbers of files would be an option.

PhilippSchlesinger avatar Sep 06 '23 11:09 PhilippSchlesinger

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well. The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync. Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (#3120 (comment)) without visible progress...

Like said by @PhilippSchlesinger, after some time using VFS on that folder with about 500 000 files, I find it too bad to keep syncing the whole folder tree. Every time somebody modifies quite a lot of files, it starts a long sync. It seems to me impossible to deploy for 30 persons, it will charge a lot the server and each computer. From my point of view, the right way to make it scalable is to sync only folders that has been accessed at least one time. I mean :

  • first sync : just sync the first folder tree. It will be instantly ready.
  • when the user opens a folder, sync this folder content, but not recursively. And add this folder to the list of folders to keep synced when there are changes in it.
  • the user can select manually a folder to be fully synced (like it's already possible) So the first access of each folder will be a bit slower, but step by step the user will get synced the folders he's using, and will never sync all other folders.

Is this technically possible ? And if yes, what do you (nextcloud devs) think about it ? It seems to me that it's the actual behaviour of the android desktop client.

tomdereub avatar Sep 19 '23 10:09 tomdereub

I'd like to add that under Mac OS things are changing towards a FileProvider based implementation, which will solve the issue by delegating a good part of the sync logic to MacOS.

IMHO, if under Windows there's no API like FileProvider, then the client should evolve itself to a lazier approach... a "full sync" approach is against scalability and in the long run it's a major limiting factor for a borader adoption of Nextcloud. In the case of 500k files and 30 users that are actively working, push notifications tend to generate very frequent peaks of PROPFIND requests coming from all the clients. Such peaks will cause slowdowns not only to the clients themselves but also to the other apps (talk, mail, calendar, deck...), and the end result is a busy server instance that actually is not doing anything except triggering propfinds and responding to propfinds, for files/folders that are often far away from where the actual users are working. That's why in my hubmle opinion this is a critical and high-priority issue.

marcotrevisan avatar Sep 19 '23 12:09 marcotrevisan

@tomdereub I'm in a very similar situation to yours and as a mitigation solution I ended up as follows:

  • use a webdav client like Mountain Duck in "online" mode for occasional browsing and work on the folder structure. It has its own issues but it basically works (don't forget to generate an "application" password for this client in the user's Settings -> Security section);
  • also use Nextcloud Client without virtual files, selecting those folders containing the most heavily used projects for the user, and instructing them how to add/remove folders to sync.

In this way, server load is under control (push notifications won't wake up all clients every time) and the clients are snappy enough to work. The advantage is that, for heavily used folders, the NC client has all the files downloaded and ready; the disadvantage is that not all the users are comfortable with such setup.

Hope it helps

marcotrevisan avatar Sep 19 '23 13:09 marcotrevisan

@marcotrevisan I'm actually trying mountainduck, and it seems to do everything I want with the "smart synchronization" mode. There is an option to index files or not. So without checking this option, it will not index all files, it will just keep index of visited folders. And there is a option to keep a folder offline on local disk. So it actually does what nextcloud vfs does, but with 2 advantages (from my point of view) :

  • it's possible not to index all files -» far more scalable
  • it mounts the webdav folder as a drive letter, what is usefull (on windows) It seems that mountainduck has part of it's code opensource, maybe it could be interesting to have a look in it.

tomdereub avatar Oct 03 '23 20:10 tomdereub

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

marcotrevisan avatar Oct 03 '23 21:10 marcotrevisan

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

Hi. I can confirm this. We have tested extensively the "Duck" on Windows and while the client does very well in terms of performance there are many other issues around file locking, online detection, working with MS office and so forth.

Is there any progress to be expected on improving the initial VFS sync speed? We are migrating at the moment a lot of files to NC and I am already afraid from starting the sync on our clients.

At the moment the inital sync with about 100K files takes about 60 minutes.

Regards

Rob

roberix avatar Oct 05 '23 12:10 roberix

Just small addition regarding the initial scan: Synchronizing placeholder files for an additional 100k files is expected to take 0 seconds (after a previous operation already took over 90 minutes for 60k files):

Screenshot 2023-10-17 101406

PhilippSchlesinger avatar Oct 19 '23 11:10 PhilippSchlesinger

It has been agreed for 2 years now (#3120 (comment)) without visible progress...

@allexzander @mgallien could you please just give us some idea of the priority of this issue and the ways to solve it ? Like "it's not the priority at the moment, so we don't know when it will be worked on", or "it's very complicated to solve, we have to re-write entirely the sync engine, so it will take some time before we can work on it", or "you're just a few users concerned, so it's not a priority, most of our users don't have so much data"...

As users, we need to know if there is some chance to get VFS scalable at a short or mid term, or if we have to found other solutions. I don't want to see my company giving up with nextcloud and other opensource software we're using, and fall into full microsoft solutions. I'm trying for some time mountainduck as an alternative, but as @marcotrevisan and @roberix have said, for some cases it's not working as well as nextcloud desktop client. So I need to know a bit more of nextcloud desktop client futur development before deploying it for all users.

tomdereub avatar Dec 12 '23 13:12 tomdereub

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

tomdereub avatar Feb 06 '24 10:02 tomdereub

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

From what I can see, looks like they began working on this about a week ago.

OpsecPGR avatar Apr 08 '24 01:04 OpsecPGR

This https://github.com/nextcloud/desktop/pull/6461 is exactly what is needed for windows too.

tomdereub avatar Apr 13 '24 19:04 tomdereub

Dear Nextcloud developers, @allexzander It would be great if you could shed some light on what is actually being worked on. Many are following this bug and many of us contributed to this issue.

See https://github.com/nextcloud/desktop/issues/4918 for a description of a performance problem (PR intended to solve the problem in https://github.com/nextcloud/desktop/pull/5941) with the tray window. Solving this heavy issue could also pay off in improving the speed problems with initial sync.

PhilippSchlesinger avatar Apr 18 '24 07:04 PhilippSchlesinger