Memory leak when gateway.connect and gateway.disconnect are called repeatedly
We found memory-leak behavior when we call gateway.connect and gateway.disconnect repeatedly, though we expected that gateway.disconnect cleans up.
We tried executing the code attached to this issue on the following environments.
- CentOS Linux release 7.9.2009 (Core)
- curl 7.79.1
- node v14.17.5
- npm v7.11.2
- fabric-sdk-node v2.2.10
- fabric-samples v2.2.3
Then, we got logs as follows:
initial state
rss [B], heapTotal [B], heapUsed [B], external [B], arrayBuffers [B]
65703936, 54411264, 17494312, 1593216, 125449
initializing network instance
Creating an gateway object
65773568, 54935552, 17339072, 1584432, 85154
Executing connect function
65773568, 54935552, 17365152, 1584576, 78976
Executing disconnect function
65773568, 54935552, 17366976, 1584576, 78976
... (repeated some times)
Executing connect function
89698304, 56246272, 19282304, 1769693, 262115
Executing disconnect function
89698304, 56246272, 19181592, 1769549, 251008
... (memory usage increases)
According to these logs, the memory usage increases every time we call connect and disconnect functions.
We attached the code to this issue for reproduction. sdk-sample.tar.gz
I can observe similar characteristics over a 5 minute run of your test application, with an initial state of
Memory usage: rss=73707520, heapTotal=54771712, heapUsed=17472296, external=1772632, arrayBuffers=79336
And at the end of the 5 minute run of:
Memory usage: rss=214474752, heapTotal=89112576, heapUsed=40513000, external=6138530, arrayBuffers=4477856
Then after a few seconds pause to give the garbage collector a chance to do some cleanup a final state of:
Memory usage: rss=215228416, heapTotal=56606720, heapUsed=40342008, external=6179602, arrayBuffers=4518768
It would need some profiling of the heap to identify exactly what was using the space and confirm it isn't caused by some other aspect of Node runtime or heap management.
I must point out that creating and discarding large numbers of connections is not good practice. You generally want to keep your Gateway connection around and use it for all work carried out by a client identity.
One other thing is that the sample client code you posted does not wait for completion of the async connect() call before then going on to call disconnect(). This is incorrect usage but I'm not sure it contributes to the memory behaviour you observe.
I have seen the same behaviour and noticed that the grpc connections remain open both on the client as well as the peer. Running the same version of the sdk, fabric-peer 2.2.2.
We notice the same issue here. In our project we create gateway-> connect -> disconnect. The memory keeps leakage and if we stop client process, client and peer memory suddenly released.
Aside from memory problems, this also causes connections to remain open. This can eventually cause a server to run out of available connections, as also reported here: https://stackoverflow.com/questions/49695485/network-leak-issue-with-event-hub-and-peer-connections-in-fabric-client-node-js
I understand the intention is not to use gateways rapidly, but that should be a performance consideration, and not a reason to leave memory and connections hanging.
Has anyone found a solution to this, other than forcibly restarting the node process the gateway is connecting through?
Any news on this?
I managed to reduce connection leak by doing two things (both):
- closing endorsers connections specified in connection profile just after gateway connects
- manually closing discovery service connection at the end
It worked in my case (for connection leaks, I didn't check memory leaks), but it was a lot of debugging and experiments. It reduced all connection leaks for successful chaincode invocations, but left some when chaincode invocation failed. Maybe it will help someone with debugging this stuff. I will probably give up and use Fabric Gateway for Fabric 2.4 anyway (https://github.com/hyperledger/fabric-gateway).
await gateway.connect(connProfile);
peerNames.forEach(peerName => {
// @ts-ignore
gateway.client.endorsers.get(peerName).disconnect(); // those connections will be replaced by new ones, but there will be no leak
});
const network = await gateway.getNetwork(channelName);
const fabricDiscoveryService = network.discoveryService; // get the reference here, not after calling contract
...
fabricDiscoveryService.close();
gateway.disconnect();
I've experiencing the same issue on Debian11, with fabric node SDK 2.2.15, and fabric 2.4.x. I will attempt the workaround proposed by @dzikowski .
A slight change to connection closing was deliver in the v2.2.17 release, which might help with this issue. The workaround mentioned above does not seem ideal but the handling of connections created during connection profile handling does sound like a good candidate for the problem area - thank you @dzikowski for the investigation.
If you are using (or can use) Fabric v2.4 or later, you should use the Fabric Gateway client API, which has much more efficient connection behaviour. It can use a single gRPC connection (over which you have direct control) for all interactions with Fabric, regardless of the number of client identities you are working with. See the migration guide for details.
I use sdk 2.2.18. Connect the fabric network and submit tens of transactions and then disconnect the network. It also ends with core dumped.
node[30909]: ../src/node_http2.cc:561:static void* node::http2::Http2Session::MemoryAllocatorInfo::H2Realloc(void*, size_t, void*): Assertion `(session->current_nghttp2_memory_) >= (previous_size)' failed.
1: 0x8fb090 node::Abort() [node]
2: 0x8fb165 [node]
3: 0x95ecfa [node]
4: 0x1738b28 nghttp2_session_close_stream [node]
5: 0x173fe8a nghttp2_session_mem_recv [node]
6: 0x95af67 node::http2::Http2Session::ConsumeHTTP2Data() [node]
7: 0x95b1ef node::http2::Http2Session::OnStreamRead(long, uv_buf_t const&) [node]
8: 0xa2cc21 node::TLSWrap::ClearOut() [node]
9: 0xa2cfc0 node::TLSWrap::OnStreamRead(long, uv_buf_t const&) [node]
10: 0x9d1021 [node]
11: 0xa7d3d9 [node]
12: 0xa7da00 [node]
13: 0xa83b58 [node]
14: 0xa71bbb uv_run [node]
15: 0x905665 node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) [node]
16: 0x90374f node::Start(int, char**) [node]
17: 0x7f121e0e9445 __libc_start_main [/lib64/libc.so.6]
18: 0x8bce95 [node]
I use sdk 2.2.18. Connect the fabric network and submit tens of transactions and then disconnect the network. It also ends with core dumped.
Is this a new problem that worked with previos SDK versions and has just started appearing with v2.2.18? It looks like a physical memory allocation failure in the Node runtime so it might be worth checking the version of Node you are using and also monitor the system memory used and available to the Node process.