featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

OOMs during cluster resize

Open dmibor opened this issue 5 years ago • 1 comments

For bugs, please provide the following:

What's going wrong?

Trying to scale up broken because of memory shortage cluster - can't do it because of OOMs on nodes doing shards transfer to new node.

What was expected?

It should be possible to scale cluster, but can't do it currently neither up nor down(i.e. issue https://github.com/pilosa/pilosa/issues/1867 ). Static cluster is pretty useless... :(

Steps to reproduce the behavior

Trying to use Set fields this time.

Heap profiles before OOM from 2 nodes: heaps.gz

By the looks of it fragment.flushcache() creates quite a lot of garbage:

$ go tool pprof -alloc_space /tmp/heap_shardsmove_1 
File: pilosa
Type: alloc_space
Time: Mar 15, 2019 at 5:51pm (+08)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top40
Showing nodes accounting for 599876.79MB, 98.23% of 610709.41MB total
Dropped 509 nodes (cum <= 3053.55MB)
Showing top 40 nodes out of 62
      flat  flat%   sum%        cum   cum%
197805.58MB 32.39% 32.39% 197805.58MB 32.39%  github.com/pilosa/pilosa/internal.(*Cache).MarshalTo
157069.89MB 25.72% 58.11% 157069.89MB 25.72%  github.com/pilosa/pilosa.(*rankCache).IDs
66428.07MB 10.88% 68.99% 66428.07MB 10.88%  github.com/pilosa/pilosa/roaring.NewContainer
53871.20MB  8.82% 77.81% 252136.86MB 41.29%  github.com/gogo/protobuf/proto.Marshal
46757.63MB  7.66% 85.46% 46757.63MB  7.66%  bytes.makeSlice
39705.58MB  6.50% 91.96% 39706.58MB  6.50%  github.com/pilosa/pilosa/roaring.(*Bitmap).Info
15769.93MB  2.58% 94.55% 15769.93MB  2.58%  github.com/pilosa/pilosa/enterprise/b.glob..func1
13495.47MB  2.21% 96.76% 15224.59MB  2.49%  github.com/pilosa/pilosa.countOpenFiles
 3466.21MB  0.57% 97.32%  3466.21MB  0.57%  github.com/pilosa/pilosa/enterprise/b.glob..func2
 3274.92MB  0.54% 97.86%  3274.92MB  0.54%  github.com/pilosa/pilosa.(*rankCache).BulkAdd
 1751.55MB  0.29% 98.15%  5217.77MB  0.85%  github.com/pilosa/pilosa/enterprise/b.(*bTreeContainers).Iterator
  367.76MB  0.06% 98.21% 46449.95MB  7.61%  io.copyBuffer
  109.51MB 0.018% 98.23% 409217.25MB 67.01%  github.com/pilosa/pilosa.(*fragment).flushCache
    3.50MB 0.00057% 98.23% 46761.13MB  7.66%  bytes.(*Buffer).grow
         0     0% 98.23% 46004.78MB  7.53%  bytes.(*Buffer).ReadFrom
         0     0% 98.23% 119996.45MB 19.65%  github.com/pilosa/pilosa.(*Field).Open
         0     0% 98.23% 119996.45MB 19.65%  github.com/pilosa/pilosa.(*Field).Open.func1

dmibor avatar Mar 15 '19 10:03 dmibor

should be fixed by containers work

jaffee avatar Apr 09 '19 15:04 jaffee