GraphEngine icon indicating copy to clipboard operation
GraphEngine copied to clipboard

ResizeCell encountered an error: out of memory exception

Open bingzhangdai opened this issue 8 years ago • 7 comments

When I deployed GE and started to import data, everything looked fine. However, I have encountered the out of memory exception and the CPU usage becomes 100% after the exception. Part of error message is listed below,

[ ERROR ] CommittedMemoryExpand: MemoryTrunk 165 failed to expand. [ ERROR ] [ ERROR ] ResizeCell encountered an error. [ ERROR ] Message buffer length: 114 [ ERROR ] CellAlloc: MemoryTrunk 165 is out of Memory. [ ERROR ] at Trinity.Storage.LocalMemoryStorage.ResizeCell(Int64 cell_id, Int32 cellEntryIndex, Int32 offset, Int32 delta) ... (ignore the call stack) [ ERROR ] Hexadecimal dump of the message buffer (first 128 bytes): [ ERROR ] Exceptions are caught in the handler of asynchronous message, message sn: 111 [ ERROR ] ResizeCell encountered an error. [ ERROR ] Message buffer length: 92 [ ERROR ] Message buffer length: 92 [ ERROR ] [ ERROR ] CommittedMemoryExpand: MemoryTrunk 165 failed to expand. [ ERROR ] CellAlloc: MemoryTrunk 165 is out of Memory. [ ERROR ] Hexadecimal dump of the message buffer (first 128 bytes): [ ERROR ] Hexadecimal dump of the message buffer (first 128 bytes): [ ERROR ]

Address 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f

00000000 0e 00 00 00 4e 00 6f 00 72 00 6d 00 55 00 72 00 00000010 6c 00 4c 00 00 00 68 00 74 00 74 00 70 00 3a 00 00000020 2f 00 2f 00 6d 00 65 00 64 00 73 00 63 00 61 00 00000030 70 00 65 00 2e 00 63 00 6f 00 6d 00 2f 00 76 00 00000040 69 00 65 00 77 00 61 00 72 00 74 00 69 00 63 00 00000050 6c 00 65 00 2f 00 37 00 35 00 33 00 32 00 38 00 00000060 38 00 04 00 00 00 96 10 42 43 2c 9b 96 ad 00 00 00000070 00 00 69 00 6e 00

I am pretty sure the server has enough memory, at least 40GB available at that time. I am wondering under what circumstances this kind of exception might happen.

bingzhangdai avatar Aug 20 '17 00:08 bingzhangdai

Each local memory storage consists of 256 memory trunks, and each trunk is capped at 2GB so.. Looks like it's getting a little bit unbalanced. How are the IDs generated?

The 100% cpu usage is most probably due to cell locks not released as there are exceptions, and others try to take the lock.

I've made several attempts to mitigate this kind of issue:

  1. The multi_cell_lock branch has a fix for this, at least it would not crash, and I've implemented a deadlock detection mechanism
  2. We are going to have a cell migration mechanism on the availability branch. A balancer module can move cells away before it gets unbalanced.

Currently the workaround is to tweak the hashing function so that the IDs are generated more evenly.

yatli avatar Aug 20 '17 13:08 yatli

Thanks kindly for your reply! Currently, I use HashHelper.HashString2Int64 to generate the IDs. The reason why I use hash is that I think this method is theoretically quicker than independent index service for queries. I do not know whether the generated IDs are unbalanced. Is there any other replacement?

I will try multi_cell_lock branch to check whether this meets my requirement. Further feedback will be put up here. :)

bingzhangdai avatar Aug 21 '17 03:08 bingzhangdai

@bingzhangdai not sure about what's a replacement suitable for your scenario.. You can use LINQ queries to get the storage space taken by each trunk:

Currently TrunkId is determined by the second LSB byte ;)

yatli avatar Aug 21 '17 08:08 yatli

I tried multi_cell_lock branch, but got an exception while loading the configuration file (I use version 2 here),

Unhandled Exception: System.TypeInitializationException: The type initializer for 'Trinity.TrinityConfig' threw an exception. ---> System.TypeInitializationException: The type initializer for 'Trinity.InternalCalls' threw an exception. ---> System.InvalidOperationException: InternalCall Trinity.Storage.CLocalMemoryStorage::CThreadContextAllocate cannot be registered! at Trinity.InternalCalls.Register() at Trinity.InternalCalls..cctor() --- End of inner exception stack trace --- at Trinity.InternalCalls.__init() at Trinity.TrinityConfig..cctor() --- End of inner exception stack trace --- at Trinity.TrinityConfig.LoadConfig()

bingzhangdai avatar Aug 22 '17 10:08 bingzhangdai

@yatli btw, I am curious about the methods to avoid the unbalance. Just as what you mentioned, TrunckId is determined directly by cellId,

public static int GetTrunkId(long cellId) {
        return *(byte*)&cellId;
}

StaticGetServerIdByCellId method works similarly. If we want to get TrunckId directly from cellId, I cannot figure out if there exist some ways to avoid unbalance in principle, apart from the more evenly cellId generation method.

bingzhangdai avatar Aug 22 '17 10:08 bingzhangdai

@bingzhangdai how about a self-balancing local memory storage? Because trunks are not visible from outside the local process, we can make arbitrary rebalancing. The tradeoff is that a register mask & shift operation is replaced by a memory access (lookup table)

yatli avatar Aug 24 '17 13:08 yatli

@yatli Thanks for your comment! That is a good idea. But the system design may be a little bit complicated. And it might have performance impact, although the impact may not be significant I guess.

bingzhangdai avatar Aug 27 '17 09:08 bingzhangdai