duplicacy icon indicating copy to clipboard operation
duplicacy copied to clipboard

Does it ready for large backups?

Open karlsoon opened this issue 7 years ago • 9 comments

Hi. I have a backup storage about 35Gb which contains a large set of small images. When I try to get the content of backups by duplicacy cat - I get OOM.

fatal error: runtime: out of memory

runtime stack: runtime.throw(0xd517d5, 0x16) /usr/local/go/src/runtime/panic.go:596 +0x95 runtime.sysMap(0xc46d010000, 0x8000000, 0x11a6900, 0x11bf738) /usr/local/go/src/runtime/mem_linux.go:216 +0x1d0 runtime.(*mheap).sysAlloc(0x11a67c0, 0x8000000, 0x420291) /usr/local/go/src/runtime/malloc.go:428 +0x374 runtime.(*mheap).grow(0x11a67c0, 0x3fff, 0x0) /usr/local/go/src/runtime/mheap.go:774 +0x62 runtime.(*mheap).allocSpanLocked(0x11a67c0, 0x3fff, 0x100) /usr/local/go/src/runtime/mheap.go:678 +0x44f runtime.(*mheap).alloc_m(0x11a67c0, 0x3fff, 0x100000000, 0xc400000000) /usr/local/go/src/runtime/mheap.go:562 +0xe2 runtime.(*mheap).alloc.func1() /usr/local/go/src/runtime/mheap.go:627 +0x4b runtime.systemstack(0xc42005ff10) /usr/local/go/src/runtime/asm_amd64.s:343 +0xab runtime.(*mheap).alloc(0x11a67c0, 0x3fff, 0x10100000000, 0x11a7bd0) /usr/local/go/src/runtime/mheap.go:628 +0xa0 runtime.largeAlloc(0x7ffc932, 0x7fb0a01a6e01, 0xc400000002) /usr/local/go/src/runtime/malloc.go:795 +0x93 runtime.mallocgc.func1() /usr/local/go/src/runtime/malloc.go:690 +0x3e runtime.systemstack(0xc420018000) /usr/local/go/src/runtime/asm_amd64.s:327 +0x79 runtime.mstart() /usr/local/go/src/runtime/proc.go:1132

goroutine 1 [running]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:281 fp=0xc42020ceb8 sp=0xc42020ceb0 runtime.mallocgc(0x7ffc932, 0xbe5760, 0xc421d12601, 0x11bdc22) /usr/local/go/src/runtime/malloc.go:691 +0x930 fp=0xc42020cf58 sp=0xc42020ceb8 runtime.makeslice(0xbe5760, 0x7ffc932, 0x7ffc932, 0x4e0efc, 0xc44cbc2000, 0x1) /usr/local/go/src/runtime/slice.go:54 +0x7b fp=0xc42020cfa8 sp=0xc42020cf58 bytes.makeSlice(0x7ffc932, 0x0, 0x0, 0x0) /usr/local/go/src/bytes/buffer.go:201 +0x77 fp=0xc42020cfe8 sp=0xc42020cfa8 bytes.(*Buffer).grow(0xc44cbc2000, 0x40, 0x1) /usr/local/go/src/bytes/buffer.go:109 +0x177 fp=0xc42020d038 sp=0xc42020cfe8 bytes.(*Buffer).WriteString(0xc44cbc2000, 0xc4477b5840, 0x40, 0x0, 0xc42020d0e8, 0xc42020d0f8) /usr/local/go/src/bytes/buffer.go:146 +0x41 fp=0xc42020d068 sp=0xc42020d038 encoding/json.(*encodeState).string(0xc44cbc2000, 0xc4477b5840, 0x40, 0xc4477b5801, 0x40) /usr/local/go/src/encoding/json/encode.go:952 +0x50f fp=0xc42020d0e0 sp=0xc42020d068 encoding/json.stringEncoder(0xc44cbc2000, 0xbe54a0, 0xc4581e22d0, 0x98, 0xc4581e0100) /usr/local/go/src/encoding/json/encode.go:608 +0x214 fp=0xc42020d1a0 sp=0xc42020d0e0 encoding/json.(*encodeState).reflectValue(0xc44cbc2000, 0xbe54a0, 0xc4581e22d0, 0x98, 0xc4581e0100) /usr/local/go/src/encoding/json/encode.go:323 +0x82 fp=0xc42020d1d8 sp=0xc42020d1a0 encoding/json.interfaceEncoder(0xc44cbc2000, 0xc25fc0, 0xc46cfd6080, 0x94, 0xc46cfc0100) /usr/local/go/src/encoding/json/encode.go:617 +0xdb fp=0xc42020d218 sp=0xc42020d1d8 encoding/json.(*mapEncoder).encode(0xc420080030, 0xc44cbc2000, 0xc3ab20, 0xc444b5f0d8, 0x195, 0x3ff0100) /usr/local/go/src/encoding/json/encode.go:690 +0x589 fp=0xc42020d388 sp=0xc42020d218 encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xc44cbc2000, 0xc3ab20, 0xc444b5f0d8, 0x195, 0xc30100) /usr/local/go/src/encoding/json/encode.go:706 +0x64 fp=0xc42020d3c8 sp=0xc42020d388 encoding/json.(*arrayEncoder).encode(0xc42000c028, 0xc44cbc2000, 0xbd69a0, 0xc44a775b00, 0x97, 0xc45ccd0100) /usr/local/go/src/encoding/json/encode.go:767 +0xf5 fp=0xc42020d420 sp=0xc42020d3c8 encoding/json.(*arrayEncoder).(encoding/json.encode)-fm(0xc44cbc2000, 0xbd69a0, 0xc44a775b00, 0x97, 0xbd0100) /usr/local/go/src/encoding/json/encode.go:774 +0x64 fp=0xc42020d460 sp=0xc42020d420 encoding/json.(*sliceEncoder).encode(0xc42000c030, 0xc44cbc2000, 0xbd69a0, 0xc44a775b00, 0x97, 0xbd0100) /usr/local/go/src/encoding/json/encode.go:741 +0xc1 fp=0xc42020d4a0 sp=0xc42020d460 encoding/json.(*sliceEncoder).(encoding/json.encode)-fm(0xc44cbc2000, 0xbd69a0, 0xc44a775b00, 0x97, 0xc44a770100) /usr/local/go/src/encoding/json/encode.go:753 +0x64 fp=0xc42020d4e0 sp=0xc42020d4a0 encoding/json.(*encodeState).reflectValue(0xc44cbc2000, 0xbd69a0, 0xc44a775b00, 0x97, 0xc44a770100) /usr/local/go/src/encoding/json/encode.go:323 +0x82 fp=0xc42020d518 sp=0xc42020d4e0 encoding/json.interfaceEncoder(0xc44cbc2000, 0xc25fc0, 0xc4667fe150, 0x94, 0xc4356a0100) /usr/local/go/src/encoding/json/encode.go:617 +0xdb fp=0xc42020d558 sp=0xc42020d518 encoding/json.(*mapEncoder).encode(0xc420080030, 0xc44cbc2000, 0xc3ab20, 0xc462e46ed0, 0x15, 0xc30100) /usr/local/go/src/encoding/json/encode.go:690 +0x589 fp=0xc42020d6c8 sp=0xc42020d558 encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xc44cbc2000, 0xc3ab20, 0xc462e46ed0, 0x15, 0xc462e40100) /usr/local/go/src/encoding/json/encode.go:706 +0x64 fp=0xc42020d708 sp=0xc42020d6c8 encoding/json.(*encodeState).reflectValue(0xc44cbc2000, 0xc3ab20, 0xc462e46ed0, 0x15, 0x100) /usr/local/go/src/encoding/json/encode.go:323 +0x82 fp=0xc42020d740 sp=0xc42020d708 encoding/json.(*encodeState).marshal(0xc44cbc2000, 0xc3ab20, 0xc462e46ed0, 0xc420200100, 0x0, 0x0) /usr/local/go/src/encoding/json/encode.go:296 +0xb8 fp=0xc42020d778 sp=0xc42020d740 encoding/json.Marshal(0xc3ab20, 0xc462e46ed0, 0x40fe2c, 0xc44a775b00, 0xc42020d9c8, 0x18, 0xbe4d01) /usr/local/go/src/encoding/json/encode.go:161 +0x6e fp=0xc42020d7c0 sp=0xc42020d778 encoding/json.MarshalIndent(0xc3ab20, 0xc462e46ed0, 0x0, 0x0, 0xd3eccc, 0x2, 0x5ac00, 0x71800, 0xc42020d8c0, 0xb46d68, ...) /usr/local/go/src/encoding/json/encode.go:170 +0x3f fp=0xc42020d840 sp=0xc42020d7c0 github.com/gilbertchen/duplicacy/src.(*SnapshotManager).PrintSnapshot(0xc42123c000, 0xc421270000, 0xc42020dac0) /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/src/duplicacy_snapshotmanager.go:984 +0xa38 fp=0xc42020da80 sp=0xc42020d840 github.com/gilbertchen/duplicacy/src.(*SnapshotManager).PrintFile(0xc42123c000, 0xc420137ef7, 0x1, 0x0, 0x0, 0x0, 0x0) /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/src/duplicacy_snapshotmanager.go:1137 +0x56e fp=0xc42020dbe8 sp=0xc42020da80 main.printFile(0xc4201c8360) /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy/duplicacy_main.go:854 +0x371 fp=0xc42020dcd8 sp=0xc42020dbe8 github.com/gilbertchen/cli.Command.Run(0xd3f23d, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd701bc, 0x53, 0x0, ...) /Users/chgang/zincbox/go/src/github.com/gilbertchen/cli/command.go:160 +0x8bc fp=0xc42020e030 sp=0xc42020dcd8 github.com/gilbertchen/cli.(*App).Run(0xc4201c8120, 0xc42007e080, 0x2, 0x2, 0x0, 0x0) /Users/chgang/zincbox/go/src/github.com/gilbertchen/cli/app.go:179 +0x8e3 fp=0xc42020e548 sp=0xc42020e030 main.main() /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy/duplicacy_main.go:1710 +0x49ff fp=0xc42020ff88 sp=0xc42020e548 runtime.main() /usr/local/go/src/runtime/proc.go:185 +0x20a fp=0xc42020ffe0 sp=0xc42020ff88 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42020ffe8 sp=0xc42020ffe0

goroutine 18 [syscall]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:116 +0x104 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:22 +0x22 created by os/signal.init.1 /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 19 [select, locked to thread]: runtime.gopark(0xd758b0, 0x0, 0xd41cd6, 0x6, 0x18, 0x2) /usr/local/go/src/runtime/proc.go:271 +0x13a runtime.selectgoImpl(0xc420038f50, 0x0, 0x18) /usr/local/go/src/runtime/select.go:423 +0x1364 runtime.selectgo(0xc420038f50) /usr/local/go/src/runtime/select.go:238 +0x1c runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:434 +0x2dd runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 20 [chan receive]: main.main.func1(0xc420067a40) /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy/duplicacy_main.go:1704 +0x57 created by main.main /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy/duplicacy_main.go:1708 +0x49cd

goroutine 34 [select]: github.com/gilbertchen/duplicacy/src.CreateChunkDownloader.func1(0xc421274000, 0x0) /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/src/duplicacy_chunkdownloader.go:79 +0x1bf created by github.com/gilbertchen/duplicacy/src.CreateChunkDownloader /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/src/duplicacy_chunkdownloader.go:86 +0x23f

karlsoon avatar Nov 20 '17 09:11 karlsoon

The memory usage is highly proportional to the number of files, not the total size of the directory. How many files are there and how large is the physical memory?

gilbertchen avatar Nov 21 '17 00:11 gilbertchen

the result from find . | wc -l is 413567 there is just 4Gb, but free+cache is about 1Gb

karlsoon avatar Nov 21 '17 05:11 karlsoon

may be implement some low level DB like leveldb or rocksdb to store file list and other stuff? this can speedup file search and lower memory usage for file listing in backups. use case: when I forget file name, I just want list the content of backup to find file name.

karlsoon avatar Nov 21 '17 05:11 karlsoon

413K files are not a lot. It is possible to track the memory while the backup is running?

gilbertchen avatar Nov 22 '17 16:11 gilbertchen

backup? I'm trying to extract the list of files by 'duplicacy cat' how to track memory usage?

karlsoon avatar Nov 23 '17 05:11 karlsoon

You can run top while the backup is running.

gilbertchen avatar Nov 26 '17 15:11 gilbertchen

I'm also having memory problems with a backup. In my case I am trying to backup my backup server, with backups of around 100 virtual machines and 32GB of RAM, and I run out of memory about half way through the backup. My storage is around 21TB, but much of that is in snapshots so I don't know the actual working set size. The backup is going to B2, the backup stats say 17537596 files, 442,769M bytes uploaded for the first half.

If I start it on the full data set, it seems to be walking the file-system and then runs out of memory before it starts to upload anything. I'm guessing that it builds the file list first, then goes back and copies the files.

Any chance there is some sort of an upgrade planned for doing an incremental file-system walk, and running the backup on that data streaming in rather than building up the whole file list up front? Or at least storing the file list in a temp file rather than in memory? I tried setting up 20GB of swap but the system locked up once it started swapping.

So far, duplicacy looks very impressive other than the memory utilization.

linsomniac avatar Mar 26 '18 15:03 linsomniac

Yes, the current implementation maintains two file lists, the list of local files and the list of remote files stored in the previous backup, and then compares these two lists to decide which files need to be uploaded. Both lists have to be loaded into memory -- this is unnecessary but unfortunately the current implementation took the easiest route for simplicity.

I do have a plan to fix this flaw for the next major update. It is just that there are a few smaller projects that need to be finished first (like adding a few more backends) so this may take a couple of months.

gilbertchen avatar Mar 26 '18 21:03 gilbertchen

@gilbertchen I have 40366344 files - would love to know when duplicacy is ready for use!

ibash avatar Nov 04 '19 15:11 ibash