azure-sdk-for-cpp icon indicating copy to clipboard operation
azure-sdk-for-cpp copied to clipboard

File descriptor leak with unstable blob store connectivity

Open bharathkumarkm opened this issue 2 years ago • 4 comments

We are using the SDK to upload small blobs (around few MBs in size) continuously. There is no issue if blob store connection is stable. When there are connectivity issues to blob store, we are seeing fd leak (~1000 per one hour) and it keeps growing. Eventually, we see below sdk errors when process fd limit is reached. Almost all of these fds are from socketpair function in libcurl which are internal localhost connections as shown below. We have looked at related issues (https://github.com/Azure/azure-sdk-for-cpp/issues/2254, https://github.com/Azure/azure-sdk-for-cpp/issues/2289 and https://github.com/curl/curl/issues/4829) we don't know if this an issue in Azure SDK or libcurl. When we examined azure SDK code we found no references to curl multi handles which uses socketpair feature. So, we compiled libcurl with -DCURL_DISABLE_SOCKETPAIR=ON which does not create socketpairs. This is not causing fd leak as expected. If this is acceptable, then vcpkg port file for azure sdk has to be patched to disable socket pairs. Or Debug if this is an azure sdk issue.

Fail to get a new connection for: https<blob_store_account_name>.blob.core.windows.net. Couldn't connect to server
Fail to get a new connection for: https<blob_store_account_name>.blob.core.windows.net. Couldn't resolve host name

[root@hostname ~]$ lsof -p <pid> | grep localh | tail
prog 1144 root 2099u     IPv4             366471      0t0   TCP localhost:44169->localhost:44625 (ESTABLISHED)
prog 1144 root 2104u     IPv4             366472      0t0   TCP localhost:44625->localhost:44169 (ESTABLISHED)
prog 1144 root 2105u     IPv4             369662      0t0   TCP localhost:59974->localhost:33111 (ESTABLISHED)
prog 1144 root 2106u     IPv4             369663      0t0   TCP localhost:33111->localhost:59974 (ESTABLISHED)
...

Steps to reproduce the behavior: On a linux appliance which is doing continuous blob uploads, below steps (which simulates unstable blob store connection) consistently reproduces the issue when run for few (2+) hours.

1. determine the azure IP address:

$ ss -natp | grep prog | grep ":443 " | tail
ESTAB      0      0      10.x.x.x:57260              52.239.246.4:443                 users:(("prog",pid=1144,fd=8986))
ESTAB      0      0      10.x.x.x:57244              52.239.246.4:443                 users:(("prog",pid=1144,fd=8981))


2. add a rule to drop packets to /etc/sysconfig/iptables:

-A INPUT -s 52.239.246.4 -j REJECT
placement in the file:

$ head /etc/sysconfig/iptables
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
# allow localhost
-A INPUT -i lo -j ACCEPT
# new line: filter all comm with azure
-A INPUT -s 52.239.246.4 -j REJECT
# allow established
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

3. run the following script
while true; do
    date
    echo "filtering..."
    systemctl start iptables
    sleep 59
    date
    echo "Recovering..."
    systemctl stop iptables
    sleep 240
  done

Setup (please complete the following information):

  • OS: CentOS 7
  • IDE : NA
  • Version of the Library used:
We used vcpkg to compile the SDK. The details are as follows:
azure-core-cpp:x64-linux                           1.2.1            Microsoft Azure Core SDK for C++
azure-core-cpp[curl]:x64-linux                                      Libcurl HTTP transport implementation
azure-core-cpp[http]:x64-linux                                      All HTTP transport implementations available on ...
azure-storage-blobs-cpp:x64-linux                  12.2.0           Microsoft Azure Storage Blobs SDK for C++
azure-storage-common-cpp:x64-linux                 12.2.0           Microsoft Azure Common Storage SDK for C++
curl:x64-linux                                     7.79.1           A library for transferring data with URLs
curl[non-http]:x64-linux                                            Enables protocols beyond HTTP/HTTPS/HTTP2
curl[openssl]:x64-linux                                             SSL support (OpenSSL)
curl[ssl]:x64-linux                                                 Default SSL backend
libiconv:x64-linux                                 1.16#11          GNU Unicode text conversion
liblzma:x64-linux                                  5.2.5#4          Compression library with an API similar to that ...
libxml2:x64-linux                                  2.9.12#4         Libxml2 is the XML C parser and toolkit develope...
openssl:x64-linux                                  1.1.1l#3         OpenSSL is an open source project that provides ...
vcpkg-cmake-config:x64-linux                       2021-09-27
vcpkg-cmake:x64-linux                              2021-09-13
zlib:x64-linux                                     1.2.11#13        A compression library

bharathkumarkm avatar Apr 14 '22 10:04 bharathkumarkm

Looks like a regression

Jinming-Hu avatar Apr 14 '22 23:04 Jinming-Hu

@bharathkumarkm can you share a sample code for how you are performing the uploading?

When you say continuously , do you mean uploading from a memory vector? or do you mean one blob after another?

We support 2 uploading APIs Upload() and UploadFrom(). The first one uses one connection and one thread to complete the operation, while the second support parallel uploading. Please help us to better understand and isolate a reproducible scenario.

vhvb1989 avatar Apr 15 '22 01:04 vhvb1989

libcurl internally calls socketpair() IIRC, so it might be an issue of connection not being freed on error

Jinming-Hu avatar Apr 15 '22 03:04 Jinming-Hu

@bharathkumarkm can you share a sample code for how you are performing the uploading?

When you say continuously , do you mean uploading from a memory vector? or do you mean one blob after another?

We support 2 uploading APIs Upload() and UploadFrom(). The first one uses one connection and one thread to complete the operation, while the second support parallel uploading. Please help us to better understand and isolate a reproducible scenario.

I can share pseudo-code for now (need some time to isolate blob upload code from our product). We use UploadFrom() to upload blob from memory buffer. We are uploading blobs in parallel. Note: We are not overwriting blob(s).i.e. Each blob upload is with a unique blob name.

Create a pool of 64 buffers

while (true) {

    Get a buffer from pool. Wait if none available.

    std::async(std::launch::async, uploadBlobFromBufferFunction)

}

bharathkumarkm avatar Apr 15 '22 03:04 bharathkumarkm

Hi @bharathkumarkm, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

github-actions[bot] avatar Apr 15 '24 18:04 github-actions[bot]