docker-osx-dev icon indicating copy to clipboard operation
docker-osx-dev copied to clipboard

sync init is taking too long on projects with lots of files

Open wan54 opened this issue 8 years ago • 19 comments

listening approach I think is preferable in this situation.. any thoughts?

wan54 avatar Sep 07 '15 10:09 wan54

I'd be curious as to the source of the slowness. Is it caused by a large number of small files? The use of ssh? Something else entirely? Is it any faster if you tar it all up and rsync just the tar ball?

I believe there is an outstanding PR for a sync_only command. I could see a watch_only command being useful too.

brikis98 avatar Sep 07 '15 11:09 brikis98

Yes, due to the large number of files. Good to know that there's improvement for this on the way.

wan54 avatar Sep 07 '15 20:09 wan54

The PR is only for a sync_only command. Someone else would need to do a PR for the watch_only command. I'm in the middle of a different project and can't switch to it at the moment, but it shouldn't take more than a minute or two to do the change.

brikis98 avatar Sep 07 '15 20:09 brikis98

I created a PR for "sync-only" and "watch-only" (as there were unaddressed comments from July in the other PR I created a new one).

https://github.com/brikis98/docker-osx-dev/pull/105

aforward avatar Sep 08 '15 11:09 aforward

watch-only unfortunately does not help because initially we still need to sync the files and there are a lot of files to be synced initially and it's taking several minutes to finish.

Are there other alternatives to this?

wan54 avatar Sep 08 '15 22:09 wan54

You'd have to profile it to see where the time is being spent. If we're bottlenecked by the number of files, the only solution I can think of is something that tars it up, rsyncs a single tarball, and then untars it on the other end. If we're bottlenecked by the amount of data, then you may have to search for some sort of mountable file system alternative to vboxsf. Some people use nfs, maybe you'd have more luck with that.

brikis98 avatar Sep 08 '15 23:09 brikis98

@brikis98 can you add option --use-gzip or sth to the initial sync? I have 113793 files (1.6GB) to initial sync :-) It takes about 45min to sync :(

noose avatar Sep 15 '15 08:09 noose

Can you benchmark it and see if it actually helps? I'd try all files separate, all files in a tarball, and all files in a tarball + gzip.

brikis98 avatar Sep 15 '15 08:09 brikis98

I excluded bunch of files (logs, compiled templates, git etc):

commands partial time total time
docker-osx-dev sync-only -- 10min 5sec
tar czf .. + scp + untar 6min 42sec + 24sec + 56sec 8min 2sec
tar cf + scp + untar 5min 48sec + 56sec + 33sec 7min 20sec

noose avatar Sep 17 '15 09:09 noose

Nice research! Looks like for a large number of files, using tar can lead to a ~30% speed up. It's not obvious how that would scale up to an even larger project, but let's assume that reduced the initial sync time down from 45 min to ~30 min. Is that still be too slow to be useful?

brikis98 avatar Sep 17 '15 09:09 brikis98

:+1:

ain avatar Sep 17 '15 11:09 ain

30% speed up is huge! It still be slow but usable (for now - it's slow & unusable).

noose avatar Sep 17 '15 13:09 noose

Fair enough. I don't have time at the moment to add that functionality, but would definitely be open to a PR that adds a --tar style flag that does the initial sync via tar & untar. I suspect it would only take a few lines of shell script to do it.

brikis98 avatar Sep 17 '15 21:09 brikis98

Yep, if you use NodeJS that have node_modules/ folder with subfolders and etc, it takes too much time. There's no way to watch only without need to sync? Thanks! @brikis98

thalesfsp avatar Dec 23 '15 20:12 thalesfsp

@thalesfsp: use the watch-only command.

brikis98 avatar Dec 24 '15 05:12 brikis98

@brikis98 When I did it without run sync before it happens:

2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known
2015-12-28 12:39:45 [INFO] rsync: connection unexpectedly closed (0 bytes received so far) [sender]
2015-12-28 12:39:45 [INFO] rsync error: unexplained error (code 255) at /SourceCache/rsync/rsync-45/rsync/io.c(453) [sender=2.6.9]
2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known
2015-12-28 12:39:45 [INFO] rsync: connection unexpectedly closed (0 bytes received so far) [sender]
2015-12-28 12:39:45 [INFO] rsync error: unexplained error (code 255) at /SourceCache/rsync/rsync-45/rsync/io.c(453) [sender=2.6.9]
2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known
2015-12-28 12:39:45 [INFO] rsync: connection unexpectedly closed (0 bytes received so far) [sender]
2015-12-28 12:39:45 [INFO] rsync error: unexplained error (code 255) at /SourceCache/rsync/rsync-45/rsync/io.c(453) [sender=2.6.9]
2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known
2015-12-28 12:39:45 [INFO] rsync: connection unexpectedly closed (0 bytes received so far) [sender]
2015-12-28 12:39:45 [INFO] rsync error: unexplained error (code 255) at /SourceCache/rsync/rsync-45/rsync/io.c(453) [sender=2.6.9]
2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known
2015-12-28 12:39:45 [INFO] rsync: connection unexpectedly closed (0 bytes received so far) [sender]
2015-12-28 12:39:45 [INFO] rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-45/rsync/io.c(453) [sender=2.6.9]
2015-12-28 12:39:45 [INFO] Warning: Identity file -o not accessible: No such file or directory.
2015-12-28 12:39:45 [INFO] ssh: Could not resolve hostname IdentitiesOnly=yes: nodename nor servname provided, or not known

thalesfsp avatar Dec 28 '15 20:12 thalesfsp

@thalesfsp: Does it work if you do the sync first? That error seems to indicate a messed up SSH configuration, not sure what it would have to do with the initial sync.

brikis98 avatar Dec 29 '15 01:12 brikis98

@brikis98 To work, I need to sync first :( And it take too much time. I would like to adopt docker-osx-dev in our company, but this time lost syncing will not be accepted by the other developers. There's no way to watch only, without the need of syncing?

thalesfsp avatar Dec 29 '15 02:12 thalesfsp

But if you start syncing, does it work or do you get a similar error?

brikis98 avatar Dec 29 '15 02:12 brikis98